Ivan

2.5K posts

Ivan banner
Ivan

Ivan

@ivanbokii

Software Engineer. Interested in system design and hammock-driven development. Edinburgh walker.

Edinburgh, Scotland Katılım Aralık 2008
646 Takip Edilen283 Takipçiler
Ivan retweetledi
François Chollet
François Chollet@fchollet·
This is more evidence that current frontier models remain completely reliant on content-level memorization, as opposed to higher-level generalizable knowledge (such as metalearning knowledge, problem-solving strategies...)
Lossfunk@lossfunk

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵

English
166
292
2.8K
242.5K
Ivan retweetledi
Joscha Bach
Joscha Bach@Plinz·
@AnnaLeptikon Computers used to be unforgiving. I wonder what will happen if the next generation of computer scientists does not grow up with “syntax error in line 20” but with “you are absolutely right, let me try…”
English
29
24
350
15.2K
Ivan
Ivan@ivanbokii·
@headinthebox It seems you're conflating skepticism about AI as a technology with a criticism of the current state of AI coding.
English
0
0
0
49
Ivan
Ivan@ivanbokii·
@headinthebox @darin_gordon Anyone practicing AI/agentic coding on a daily basis sees how AI struggles to maintain its own code due to quickly compounding complexity. You'd have to assume AI is a perfect coder to deny this. Obviously a flawed assumption.
English
0
0
2
84
Erik Meijer
Erik Meijer@headinthebox·
@darin_gordon That assumes that humans will have to look at the code and maintain it. Obviously a flawed assumption.
English
5
0
9
1.7K
Ivan
Ivan@ivanbokii·
It’s quite unfortunate that GEPA Optimize Anything didn’t get enough traction, while very, very similar ideas promoted by Karpathy’s autoresearch + Lütke’s pi-autoresearch - got so much traction, despite being less general
English
11
12
122
12.2K
Ivan
Ivan@ivanbokii·
@fatih Concave with replaceable switches would be the dream, but afaik there's no good way to make it work, specifically because of the shape. I ended up manually resoldering switches on the 2/360 for the same reason - the stock switches are just plainly bad. Thank you for the reply 🙌
English
0
0
1
73
Fatih Arslan
Fatih Arslan@fatih·
Thank you Ivan. I also own several Kinesis, Glove80 and many others. Except the Kinesis, all others feel finnicky. Second they don't allow me to switch switches (including the Kinesis). I love the Kinesis, but the switches are very bad, and living in Turkey, there is now way I can customize it. Yes I do miss the concave, but the Elora makes that up with aggresive staggering, and now with my own design (Which is tilted in both X and Y axises), it also fits my hands perfectly. Second, this is just an hobby for me, I'm trying to master ID and design by doing this. My grand plan is create my OWN conclave PCB and keyboard. I've didnt' started on that yet, but we'll do it later.
English
1
0
1
343
Fatih Arslan
Fatih Arslan@fatih·
I'm pretty happy about the final result. It's molded and designed for my own ergonomics, looks great, feels great to type one (has a 600gram heavy base). It's one of a kind. In a parallel universe I would create a CNC'd aluminum base, but that's for another life.
Fatih Arslan tweet mediaFatih Arslan tweet mediaFatih Arslan tweet mediaFatih Arslan tweet media
English
15
1
108
26.9K
Ivan
Ivan@ivanbokii·
@techdevdaily @mitsuhiko @thekitze I think it’s a general social media psychosis. “fuck everything… lock in… generational wealth” - a weird x/AI distortion of reality.
English
1
0
2
36
Armin Ronacher ⇌
Armin Ronacher ⇌@mitsuhiko·
Not to call out @thekitze but there are quit a few people who are "don't care the world is going to shit because we can vibecode yourselves to generational wealth" and I find that a tad … disturbing? Is this really the current developer Zeitgeist? x.com/thekitze/statu…
kitze 🛠️ tinkerer.club@thekitze

bro fuck the news fuck the president fuck the leaked files fuck rumors fuck gossip fuck doomscrolling fuck celebrities fuck sidney sweeney fuck sending reels to your friends fuck gaming fuck binging shows fuck everything NOW IS THE CRAZIEST TIME TO LOCK THE FUCK IN AND MAKE GENERATIONAL WEALTH. DO NOT FUMBLE THIS!!!!!

English
58
7
347
53.8K
Ivan
Ivan@ivanbokii·
@jnardiello @fatih @RepoPrompt has implemented a very nice approach using managed file selection, which forms the context exposed to the pairing models through a standalone tool that is part of their MCP server
English
1
0
1
99
Ivan
Ivan@ivanbokii·
@jnardiello @fatih for coding, how exactly the context is shared between the main driving model and models exposed through zen mcp? I.e. if the main model already traversed the filesystem and collected required code, how's this code shared with "pairing" models?
English
1
0
0
99
Fatih Arslan
Fatih Arslan@fatih·
I'm very impressed by Opus 4.1. It's burning tokens a lot, but it's a lot of smarter compared to Sonnet for understanding things. I use it for things when Sonnet is stuck, and circling in rounds, and it usually can one-shot fix issues.
English
3
0
28
6.4K
eric provencher
eric provencher@pvncher·
Some of the power of the improved search tool in @RepoPrompt MCP combined with @claude_code It's able to look for specific methods it needs in a given file. With the new edit tools it's also a lot faster at churning through tasks!
eric provencher tweet media
English
8
0
46
2.6K
Ivan
Ivan@ivanbokii·
@pvncher @RepoPrompt What’s the point of exposing navigation and editing through RepoPrompt when CLI agents already have their own toolsets? You can also tap into 2.5 Pro from Claude Code using "gemini -p" already. It’s almost like CLI tools have made your PMF redundant.
English
0
0
1
88
Ivan
Ivan@ivanbokii·
@pvncher @RepoPrompt I feel like CLI tools like Claude Code, OpenCode, AMP, Gemini-CLI have made RepoPrompt a bit redundant. It seems like you’re pivoting hard from a tool that composes context for AI web chat interfaces to an MCP server that’s primarily useful for people who aren’t using CLI agents.
English
1
0
1
142
eric provencher
eric provencher@pvncher·
Coming soon - Claude can use agentically invoke @RepoPrompt's built in chat, to leverage the powerful delegate file editing workflow with o3 or Gemini 2.5 pro, relying on smarter models to engineer changes for it
eric provencher tweet media
English
12
1
91
12.2K
Ivan
Ivan@ivanbokii·
ugh, forgot to mention that I'm referring to Claude Code and not the web interface
English
0
0
0
199
Ivan
Ivan@ivanbokii·
Hey @AnthropicAI folks, I’m on $100 Claude subscription, and I just started my 5hour window but already seeing “Claude Opus 4 limit reached, now using Sonnet 4.” Had a similar experience yesterday. I understand that Opus 4 counts 5x in terms of consumption, but it’s clearly off.
English
1
0
0
281
Ivan
Ivan@ivanbokii·
@daylightco Can you please tell me when are you going to ship amber Sunday orders?
English
0
0
0
40
daylight
daylight@daylightco·
and if you are an active deposit holder but have yet to convert to a full order, email us ASAP hello@daylightcomputer.com
English
1
0
6
2.6K
daylight
daylight@daylightco·
we are officially back on sale! (for good this time) • no more $100 deposits • added accessories to complement your DC-1 • <2 month ship time only a small quantity of units left for shipment by end of April 👀
English
11
27
201
22.3K
Ivan retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
I don't have too too much to add on top of this earlier post on V3 and I think it applies to R1 too (which is the more recent, thinking equivalent). I will say that Deep Learning has a legendary ravenous appetite for compute, like no other algorithm that has ever been developed in AI. You may not always be utilizing it fully but I would never bet against compute as the upper bound for achievable intelligence in the long run. Not just for an individual final training run, but also for the entire innovation / experimentation engine that silently underlies all the algorithmic innovations. Data has historically been seen as a separate category from compute, but even data is downstream of compute to a large extent - you can spend compute to create data. Tons of it. You've heard this called synthetic data generation, but less obviously, there is a very deep connection (equivalence even) between "synthetic data generation" and "reinforcement learning". In the trial-and-error learning process in RL, the "trial" is model generating (synthetic) data, which it then learns from based on the "error" (/reward). Conversely, when you generate synthetic data and then rank or filter it in any way, your filter is straight up equivalent to a 0-1 advantage function - congrats you're doing crappy RL. Last thought. Not sure if this is obvious. There are two major types of learning, in both children and in deep learning. There is 1) imitation learning (watch and repeat, i.e. pretraining, supervised finetuning), and 2) trial-and-error learning (reinforcement learning). My favorite simple example is AlphaGo - 1) is learning by imitating expert players, 2) is reinforcement learning to win the game. Almost every single shocking result of deep learning, and the source of all *magic* is always 2. 2 is significantly significantly more powerful. 2 is what surprises you. 2 is when the paddle learns to hit the ball behind the blocks in Breakout. 2 is when AlphaGo beats even Lee Sedol. And 2 is the "aha moment" when the DeepSeek (or o1 etc.) discovers that it works well to re-evaluate your assumptions, backtrack, try something else, etc. It's the solving strategies you see this model use in its chain of thought. It's how it goes back and forth thinking to itself. These thoughts are *emergent* (!!!) and this is actually seriously incredible, impressive and new (as in publicly available and documented etc.). The model could never learn this with 1 (by imitation), because the cognition of the model and the cognition of the human labeler is different. The human would never know to correctly annotate these kinds of solving strategies and what they should even look like. They have to be discovered during reinforcement learning as empirically and statistically useful towards a final outcome. (Last last thought/reference this time for real is that RL is powerful but RLHF is not. RLHF is not RL. I have a separate rant on that in an earlier tweet x.com/karpathy/statu…)
Andrej Karpathy@karpathy

DeepSeek (Chinese AI co) making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M). For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being brought up today are more around 100K GPUs. E.g. Llama 3 405B used 30.8M GPU-hours, while DeepSeek-V3 looks to be a stronger model at only 2.8M GPU-hours (~11X less compute). If the model also passes vibe checks (e.g. LLM arena rankings are ongoing, my few quick tests went well so far) it will be a highly impressive display of research and engineering under resource constraints. Does this mean you don't need large GPU clusters for frontier LLMs? No but you have to ensure that you're not wasteful with what you have, and this looks like a nice demonstration that there's still a lot to get through with both data and algorithms. Very nice & detailed tech report too, reading through.

English
364
2.1K
14.4K
2.4M
Ivan
Ivan@ivanbokii·
@daylightco still haven't received my shipment notification for batch 5. Should I be worried?
English
1
0
0
72
daylight
daylight@daylightco·
Batch 5 units will start shipping tomorrow!
GIF
English
33
4
206
14.2K
Ivan retweetledi
Gonzalo Cordova
Gonzalo Cordova@gonzalo_io·
Been diving into some great AI/ML blogs lately: - Blog by @chipro - Blog by @eugeneyan (links below) I'm curious to find new gems. What are you reading these days?
English
4
3
19
2.2K