Ivan

2.5K posts

Ivan

@ivanbokii

Software Engineer. Interested in system design and hammock-driven development. Edinburgh walker.

Edinburgh, Scotland Katılım Aralık 2008

646 Takip Edilen283 Takipçiler

Ivan retweetledi

François Chollet@fchollet·19h

This is more evidence that current frontier models remain completely reliant on content-level memorization, as opposed to higher-level generalizable knowledge (such as metalearning knowledge, problem-solving strategies...)

Lossfunk@lossfunk

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵

English

166

292

2.8K

242.5K

Ivan retweetledi

Joscha Bach@Plinz·2d

@AnnaLeptikon Computers used to be unforgiving. I wonder what will happen if the next generation of computer scientists does not grow up with “syntax error in line 20” but with “you are absolutely right, let me try…”

English

350

15.2K

Ivan retweetledi

Jessie Frazelle@jessfraz·2d

I give it less than 6 months before Garry stops preaching LOC and starts preaching maintainable code bases. And with that one move he will go from junior engineer to a bit more senior. We watching his Eng journey live 🍿

Garry Tan@garrytan

If I can do 16k LOC per day across 3 different projects (including one open source one you can see yourself) then I think almost any technical CEO CTO pair at YC will That's the bar now

English

161.7K

Ivan@ivanbokii·2d

@headinthebox It seems you're conflating skepticism about AI as a technology with a criticism of the current state of AI coding.

English

Erik Meijer@headinthebox·2d

What do the doubters see that we don't? Or, do we see something they don't. IMHO, the biggest mystery right now in our field.

David Cramer@zeeg

i can write 50k lines of code a day and it will absolutely not generate any tangible lasting value nor will these 16k, from Garry or anyone else (sorry, but its the truth)

English

31.6K

Ivan@ivanbokii·2d

@headinthebox @darin_gordon Anyone practicing AI/agentic coding on a daily basis sees how AI struggles to maintain its own code due to quickly compounding complexity. You'd have to assume AI is a perfect coder to deny this. Obviously a flawed assumption.

English

Erik Meijer@headinthebox·2d

@darin_gordon That assumes that humans will have to look at the code and maintain it. Obviously a flawed assumption.

English

1.7K

Ivan@ivanbokii·4d

It’s quite unfortunate that GEPA Optimize Anything didn’t get enough traction, while very, very similar ideas promoted by Karpathy’s autoresearch + Lütke’s pi-autoresearch - got so much traction, despite being less general

English

122

12.2K

Ivan@ivanbokii·7 Şub

@fatih Concave with replaceable switches would be the dream, but afaik there's no good way to make it work, specifically because of the shape. I ended up manually resoldering switches on the 2/360 for the same reason - the stock switches are just plainly bad. Thank you for the reply 🙌

English

Fatih Arslan@fatih·7 Şub

Thank you Ivan. I also own several Kinesis, Glove80 and many others. Except the Kinesis, all others feel finnicky. Second they don't allow me to switch switches (including the Kinesis). I love the Kinesis, but the switches are very bad, and living in Turkey, there is now way I can customize it. Yes I do miss the concave, but the Elora makes that up with aggresive staggering, and now with my own design (Which is tilted in both X and Y axises), it also fits my hands perfectly. Second, this is just an hobby for me, I'm trying to master ID and design by doing this. My grand plan is create my OWN conclave PCB and keyboard. I've didnt' started on that yet, but we'll do it later.

English

343

Fatih Arslan@fatih·7 Şub

I'm pretty happy about the final result. It's molded and designed for my own ergonomics, looks great, feels great to type one (has a 600gram heavy base). It's one of a kind. In a parallel universe I would create a CNC'd aluminum base, but that's for another life.

English

108

26.9K

Ivan@ivanbokii·4 Şub

@techdevdaily @mitsuhiko @thekitze I think it’s a general social media psychosis. “fuck everything… lock in… generational wealth” - a weird x/AI distortion of reality.

English

avrl ☘@avrldotdev·3 Şub

@mitsuhiko @thekitze I think bro is in AI psychosis

English

2.2K

Armin Ronacher ⇌@mitsuhiko·3 Şub

Not to call out @thekitze but there are quit a few people who are "don't care the world is going to shit because we can vibecode yourselves to generational wealth" and I find that a tad … disturbing? Is this really the current developer Zeitgeist? x.com/thekitze/statu…

kitze 🛠️ tinkerer.club@thekitze

bro fuck the news fuck the president fuck the leaked files fuck rumors fuck gossip fuck doomscrolling fuck celebrities fuck sidney sweeney fuck sending reels to your friends fuck gaming fuck binging shows fuck everything NOW IS THE CRAZIEST TIME TO LOCK THE FUCK IN AND MAKE GENERATIONAL WEALTH. DO NOT FUMBLE THIS!!!!!

English

347

53.8K

Ivan@ivanbokii·12 Ağu

@jnardiello @fatih @RepoPrompt has implemented a very nice approach using managed file selection, which forms the context exposed to the pairing models through a standalone tool that is part of their MCP server

English

Ivan@ivanbokii·12 Ağu

@jnardiello @fatih for coding, how exactly the context is shared between the main driving model and models exposed through zen mcp? I.e. if the main model already traversed the filesystem and collected required code, how's this code shared with "pairing" models?

English

Fatih Arslan@fatih·12 Ağu

I'm very impressed by Opus 4.1. It's burning tokens a lot, but it's a lot of smarter compared to Sonnet for understanding things. I use it for things when Sonnet is stuck, and circling in rounds, and it usually can one-shot fix issues.

English

6.4K

Ivan@ivanbokii·4 Tem

@pvncher @RepoPrompt @claude_code Eric, how's this different from a regular grep tool already available to cli agents?

English

eric provencher@pvncher·3 Tem

Some of the power of the improved search tool in @RepoPrompt MCP combined with @claude_code It's able to look for specific methods it needs in a given file. With the new edit tools it's also a lot faster at churning through tasks!

English

2.6K

Ivan@ivanbokii·4 Tem

@pvncher @RepoPrompt What’s the point of exposing navigation and editing through RepoPrompt when CLI agents already have their own toolsets? You can also tap into 2.5 Pro from Claude Code using "gemini -p" already. It’s almost like CLI tools have made your PMF redundant.

English

Ivan@ivanbokii·4 Tem

@pvncher @RepoPrompt I feel like CLI tools like Claude Code, OpenCode, AMP, Gemini-CLI have made RepoPrompt a bit redundant. It seems like you’re pivoting hard from a tool that composes context for AI web chat interfaces to an MCP server that’s primarily useful for people who aren’t using CLI agents.

English

142

eric provencher@pvncher·3 Tem

Coming soon - Claude can use agentically invoke @RepoPrompt's built in chat, to leverage the powerful delegate file editing workflow with o3 or Gemini 2.5 pro, relying on smarter models to engineer changes for it

English

12.2K

Ivan@ivanbokii·13 Haz

ugh, forgot to mention that I'm referring to Claude Code and not the web interface

English

199

Ivan@ivanbokii·13 Haz

Hey @AnthropicAI folks, I’m on $100 Claude subscription, and I just started my 5hour window but already seeing “Claude Opus 4 limit reached, now using Sonnet 4.” Had a similar experience yesterday. I understand that Opus 4 counts 5x in terms of consumption, but it’s clearly off.

English

281

Ivan@ivanbokii·6 Mar

@daylightco Can you please tell me when are you going to ship amber Sunday orders?

English

daylight@daylightco·6 Mar

and if you are an active deposit holder but have yet to convert to a full order, email us ASAP hello@daylightcomputer.com

English

2.6K

daylight@daylightco·6 Mar

we are officially back on sale! (for good this time) • no more $100 deposits • added accessories to complement your DC-1 • <2 month ship time only a small quantity of units left for shipment by end of April 👀

English

201

22.3K

Ivan retweetledi

Andrej Karpathy@karpathy·27 Oca

I don't have too too much to add on top of this earlier post on V3 and I think it applies to R1 too (which is the more recent, thinking equivalent). I will say that Deep Learning has a legendary ravenous appetite for compute, like no other algorithm that has ever been developed in AI. You may not always be utilizing it fully but I would never bet against compute as the upper bound for achievable intelligence in the long run. Not just for an individual final training run, but also for the entire innovation / experimentation engine that silently underlies all the algorithmic innovations. Data has historically been seen as a separate category from compute, but even data is downstream of compute to a large extent - you can spend compute to create data. Tons of it. You've heard this called synthetic data generation, but less obviously, there is a very deep connection (equivalence even) between "synthetic data generation" and "reinforcement learning". In the trial-and-error learning process in RL, the "trial" is model generating (synthetic) data, which it then learns from based on the "error" (/reward). Conversely, when you generate synthetic data and then rank or filter it in any way, your filter is straight up equivalent to a 0-1 advantage function - congrats you're doing crappy RL. Last thought. Not sure if this is obvious. There are two major types of learning, in both children and in deep learning. There is 1) imitation learning (watch and repeat, i.e. pretraining, supervised finetuning), and 2) trial-and-error learning (reinforcement learning). My favorite simple example is AlphaGo - 1) is learning by imitating expert players, 2) is reinforcement learning to win the game. Almost every single shocking result of deep learning, and the source of all *magic* is always 2. 2 is significantly significantly more powerful. 2 is what surprises you. 2 is when the paddle learns to hit the ball behind the blocks in Breakout. 2 is when AlphaGo beats even Lee Sedol. And 2 is the "aha moment" when the DeepSeek (or o1 etc.) discovers that it works well to re-evaluate your assumptions, backtrack, try something else, etc. It's the solving strategies you see this model use in its chain of thought. It's how it goes back and forth thinking to itself. These thoughts are *emergent* (!!!) and this is actually seriously incredible, impressive and new (as in publicly available and documented etc.). The model could never learn this with 1 (by imitation), because the cognition of the model and the cognition of the human labeler is different. The human would never know to correctly annotate these kinds of solving strategies and what they should even look like. They have to be discovered during reinforcement learning as empirically and statistically useful towards a final outcome. (Last last thought/reference this time for real is that RL is powerful but RLHF is not. RLHF is not RL. I have a separate rant on that in an earlier tweet x.com/karpathy/statu…)

Andrej Karpathy@karpathy

DeepSeek (Chinese AI co) making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M). For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being brought up today are more around 100K GPUs. E.g. Llama 3 405B used 30.8M GPU-hours, while DeepSeek-V3 looks to be a stronger model at only 2.8M GPU-hours (~11X less compute). If the model also passes vibe checks (e.g. LLM arena rankings are ongoing, my few quick tests went well so far) it will be a highly impressive display of research and engineering under resource constraints. Does this mean you don't need large GPU clusters for frontier LLMs? No but you have to ensure that you're not wasteful with what you have, and this looks like a nice demonstration that there's still a lot to get through with both data and algorithms. Very nice & detailed tech report too, reading through.

English

364

2.1K

14.4K

2.4M

Ivan@ivanbokii·28 Oca

@daylightco still haven't received my shipment notification for batch 5. Should I be worried?

English

daylight@daylightco·20 Oca

Batch 5 units will start shipping tomorrow!

GIF

English

206

14.2K

Ivan retweetledi

Gonzalo Cordova@gonzalo_io·17 Oca

Been diving into some great AI/ML blogs lately: - Blog by @chipro - Blog by @eugeneyan (links below) I'm curious to find new gems. What are you reading these days?

English

2.2K

Keşfet

@AnnaLeptikon @headinthebox @darin_gordon @fatih @mitsuhiko @thekitze @jnardiello @RepoPrompt