Ben Duffy

784 posts

Ben Duffy banner
Ben Duffy

Ben Duffy

@benduffyMMM

Building robots which can see through walls. And humanoids. Co-founder at https://t.co/9YTX5UFnuG and https://t.co/MQhAdPbJ2a https://t.co/aDXue4Xob8

Berlin, Germany Katılım Nisan 2016
418 Takip Edilen443 Takipçiler
Ben Duffy
Ben Duffy@benduffyMMM·
Claude Opus 4.7 likes: Phases, tasks, gates, stages, rungs
English
0
0
0
33
Ben Duffy
Ben Duffy@benduffyMMM·
A year ago, I asked all LLMs back then (claude 3.7 + grok 3 + deepseek v3 + gemini 2.5 + deep research) to predict the next 5 years of progress and partially to assess the plausability of the AI 2027 report. One thing the report got is that all the labs are focusing on recursive improvement e.g. with Codex 5.3 helping created Chat Gpt 5.4 and so on i.e. "closing the loop". Anyway, this year, new prompt and new models. Getting more quantitative and then will ask chatgpt 5.4 to summarise all answers and compare. Getting a bit meta to ask multiple AI agents to predict future progress of AI and compare previous forecasts. Grok is supposed to be optimised on forecasting accuracy! Summary from ChatGPT of below answers from 5.4, gemini, grok and claude 4.6 sonnet:
Ben Duffy tweet mediaBen Duffy tweet media
Ben Duffy@benduffyMMM

Right, starting now, for shits and giggles, I will ask the top ~5 models every year on April 6th to: "predict the next 5 years of AI and AGI progress" Then we can compare over the years: 1. How right/wrong this forecast report got it /th

English
1
0
1
70
Ben Duffy
Ben Duffy@benduffyMMM·
I love talking to mini AGIs about what a true AGI will be like
Ben Duffy tweet mediaBen Duffy tweet mediaBen Duffy tweet media
English
0
0
0
23
Ben Duffy
Ben Duffy@benduffyMMM·
@karpathy Why not just make all the agents click through websites? I thought computer use is "almost there" shown by claude chrome extension and ChatGPT agent? Of course first step is to give them permissions. Redesigning everything for text in and text out isn't the dream of digital AGI.
English
0
0
0
190
Andrej Karpathy
Andrej Karpathy@karpathy·
When I built menugen ~1 year ago, I observed that the hardest part by far was not the code itself, it was the plethora of services you have to assemble like IKEA furniture to make it real, the DevOps: services, payments, auth, database, security, domain names, etc... I am really looking forward to a day where I could simply tell my agent: "build menugen" (referencing the post) and it would just work. The whole thing up to the deployed web page. The agent would have to browse a number of services, read the docs, get all the api keys, make everything work, debug it in dev, and deploy to prod. This is the actually hard part, not the code itself. Or rather, the better way to think about it is that the entire DevOps lifecycle has to become code, in addition to the necessary sensors/actuators of the CLIs/APIs with agent-native ergonomics. And there should be no need to visit web pages, click buttons, or anything like that for the human. It's easy to state, it's now just barely technically possible and expected to work maybe, but it definitely requires from-scratch re-design, work and thought. Very exciting direction!
Patrick Collison@patrickc

When @karpathy built MenuGen (karpathy.bearblog.dev/vibe-coding-me…), he said: "Vibe coding menugen was exhilarating and fun escapade as a local demo, but a bit of a painful slog as a deployed, real app. Building a modern app is a bit like assembling IKEA future. There are all these services, docs, API keys, configurations, dev/prod deployments, team and security features, rate limits, pricing tiers." We've all run into this issue when building with agents: you have to scurry off to establish accounts, clicking things in the browser as though it's the antediluvian days of 2023, in order to unblock its superintelligent progress. So we decided to build Stripe Projects to help agents instantly provision services from the CLI. For example, simply run: $ stripe projects add posthog/analytics And it'll create a PostHog account, get an API key, and (as needed) set up billing. Projects is launching today as a developer preview. You can register for access (we'll make it available to everyone soon) at projects.dev. We're also rolling out support for many new providers over the coming weeks. (Get in touch if you'd like to make your service available.) projects.dev

English
625
537
6.4K
2.4M
Ben Duffy
Ben Duffy@benduffyMMM·
I love robots But sometimes they don't love me... 🥲
English
0
0
0
53
Ben Duffy
Ben Duffy@benduffyMMM·
Humanoids. Built in our own image... THE HUBRIS!!! I love it!
English
1
0
0
44
Ben Duffy
Ben Duffy@benduffyMMM·
omg, we live in the future, claude is taking control of my browser to add my dishwasher and 15 other items as ads in ebay (kleinanzeigen) and facebook marketplace.
Ben Duffy tweet media
English
0
0
0
199
Ben Duffy
Ben Duffy@benduffyMMM·
Missed this one. 2026/2027 is gonna be the year of AI and science combined.
Ben Duffy tweet media
English
0
0
0
33
Ben Duffy
Ben Duffy@benduffyMMM·
Cursor's composer-1 frontier LLM is super fast and accurate highly underated!
English
0
0
0
29
Ben Duffy retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
The first 100% autonomous coast-to-coast drive on Tesla FSD V14.2! 2 days 20 hours, 2732 miles, zero interventions. This one is special because the coast-to-coast drive was a major goal for the autopilot team from the start. A lot of hours were spent in marathon clip review sessions late into the night looking over interventions as we attempted legs of the drive over time - triaging, categorizing, planning out all the projects to close the gap and bring the number of interventions to zero. Amazing to see the system actually get there and huge congrats to the team!
David Moss@DavidMoss

I am proud to announce that I have successfully completed the world’s first USA coast to coast fully autonomous drive! I left the Tesla Diner in Los Angeles 2 days & 20 hours ago, and now have ended in Myrtle Beach, SC (2,732.4 miles) This was accomplished with Tesla FSD V14.2 with absolutely 0 disengagements of any kind even for all parking including at Tesla Superchargers.

English
311
1K
14.1K
1.1M
Abhishek B R
Abhishek B R@abhitwt·
For people who keep asking what to build - Build your own operating system - Build your database - Build your virtual machine - Build your web server - Build your own game engine - Build your compiler - Build your own programming language - Build your own browser - Build your own blockchain - Build your own encryption algorithm - Build your own CPU emulator - Build your own file system - Build your own container runtime - Build your own package manager - Build your own shell - Build your own window manager - Build your own GUI toolkit - Build your own text editor - Build your own IDE - Build your own version control system - Build your own network protocol - Build your own operating system kernel in assembly - Build your own scheduler - Build your own memory allocator - Build your own hypervisor - Build your own microkernel - Build your own compiler backend (LLVM target) - Build your own query language - Build your own cache system (like Redis) - Build your own message broker (like Kafka) - Build your own search engine - Build your own machine learning framework - Build your own graphics renderer (rasterizer or ray tracer) - Build your own physics engine - Build your own scripting language - Build your own audio engine - Build your own database driver - Build your own networking stack (TCP/IP implementation) - Build your own API gateway - Build your own reverse proxy - Build your own load balancer - Build your own CI/CD system - Build your own operating system bootloader - Build your own container orchestrator (like Kubernetes) - Build your own distributed file system - Build your own key -value store - Build your own authentication server (OAuth2/OpenID Connect) - Build your own operating system scheduler - Build your own compiler optimizer - Build your own disassembler - Build your own debugger - Build your own profiler - Build your own static code analyzer - Build your own runtime (like Node.js) - Build your own scripting sandbox - Build your own browser engine (HTML/CSS/JS parser and renderer) - Build your own blockchain consensus algorithm - Build your own zero -knowledge proof system - Build your own operating system for embedded devices
English
171
553
4.6K
273.6K
Ben Duffy retweetledi
Chris Offner
Chris Offner@chrisoffner3d·
Paper naming conventions are reaching a climax.
Chris Offner tweet media
English
4
14
105
14.6K
kache
kache@yacineMTB·
is there an accessible 3d model of the human musculoskeletal system that i can navigate?
English
34
5
134
14.3K
Ben Duffy
Ben Duffy@benduffyMMM·
Somehow missed this. Always love Minecraft/open-ended papers! Voyager paper blew my mind, but used code-gen on the Mineflayer API! 2022 pixel-to-action papers below are similar but used fine tuning. But this is with only offline data! I think very relevant for robotics.
Ben Duffy tweet media
English
1
0
0
60