wonick.eth⚡️in New York🗽

722 posts

wonick.eth⚡️in New York🗽 banner
wonick.eth⚡️in New York🗽

wonick.eth⚡️in New York🗽

@wonick_park

Head of Planet New York @TheMiilkInc🥛 | ex-Silicon Valley Correspondent @ChosunBiz🇰🇷 | Tech & Business Journalist👨🏻‍💻 | AI & Web3 Explorer🗺

Manhattan, NY Katılım Aralık 2017
540 Takip Edilen223 Takipçiler
wonick.eth⚡️in New York🗽
wonick.eth⚡️in New York🗽@wonick_park·
👨‍💻
Andrej Karpathy@karpathy

Very interested in what the coming era of highly bespoke software might look like. Example from this morning - I've become a bit loosy goosy with my cardio recently so I decided to do a more srs, regimented experiment to try to lower my Resting Heart Rate from 50 -> 45, over experiment duration of 8 weeks. The primary way to do this is to aspire to a certain sum total minute goals in Zone 2 cardio and 1 HIIT/week. 1 hour later I vibe coded this super custom dashboard for this very specific experiment that shows me how I'm tracking. Claude had to reverse engineer the Woodway treadmill cloud API to pull raw data, process, filter, debug it and create a web UI frontend to track the experiment. It wasn't a fully smooth experience and I had to notice and ask to fix bugs e.g. it screwed up metric vs. imperial system units and it screwed up on the calendar matching up days to dates etc. But I still feel like the overall direction is clear: 1) There will never be (and shouldn't be) a specific app on the app store for this kind of thing. I shouldn't have to look for, download and use some kind of a "Cardio experiment tracker", when this thing is ~300 lines of code that an LLM agent will give you in seconds. The idea of an "app store" of a long tail of discrete set of apps you choose from feels somehow wrong and outdated when LLM agents can improvise the app on the spot and just for you. 2) Second, the industry has to reconfigure into a set of services of sensors and actuators with agent native ergonomics. My Woodway treadmill is a sensor - it turns physical state into digital knowledge. It shouldn't maintain some human-readable frontend and my LLM agent shouldn't have to reverse engineer it, it should be an API/CLI easily usable by my agent. I'm a little bit disappointed (and my timelines are correspondingly slower) with how slowly this progression is happening in the industry overall. 99% of products/services still don't have an AI-native CLI yet. 99% of products/services maintain .html/.css docs like I won't immediately look for how to copy paste the whole thing to my agent to get something done. They give you a list of instructions on a webpage to open this or that url and click here or there to do a thing. In 2026. What am I a computer? You do it. Or have my agent do it. So anyway today I am impressed that this random thing took 1 hour (it would have been ~10 hours 2 years ago). But what excites me more is thinking through how this really should have been 1 minute tops. What has to be in place so that it would be 1 minute? So that I could simply say "Hi can you help me track my cardio over the next 8 weeks", and after a very brief Q&A the app would be up. The AI would already have a lot personal context, it would gather the extra needed data, it would reference and search related skill libraries, and maintain all my little apps/automations. TLDR the "app store" of a set of discrete apps that you choose from is an increasingly outdated concept all by itself. The future are services of AI-native sensors & actuators orchestrated via LLM glue into highly custom, ephemeral apps. It's just not here yet.

ART
0
0
0
12
Hyun
Hyun@hiiihyun·
I am going to follow back the 100 followers / commenters today as a big thank you! Best if you are either interested in #trading and/or #Tech #ai #developer
Manhattan, NY 🇺🇸 English
101
6
128
13.9K
Kyunghyun Cho
Kyunghyun Cho@kchonyc·
representing NYC by celebrating our 🍕 🐀 at the global frontier symposium in seoul
Kyunghyun Cho tweet media
English
3
2
153
29.3K
wonick.eth⚡️in New York🗽
👀
Andrej Karpathy@karpathy

Love this project: nanoGPT -> recursive self-improvement benchmark. Good old nanoGPT keeps on giving and surprising :) - First I wrote it as a small little repo to teach people the basics of training GPTs. - Then it became a target and baseline for my port to direct C/CUDA re-implementation in llm.c. - Then that was modded (by @kellerjordan0 et al.) into a (small-scale) LLM research harness. People iteratively optimized the training so that e.g. reproducing GPT-2 (124M) performance takes not 45 min (original) but now only 3 min! - Now the idea is to use this process of optimizing the code as a benchmark for LLM coding agents. If humans can speed up LLM training from 45 to 3 minutes, how well do LLM Agents do, under different kinds of settings (e.g. with or without hints etc.)? (spoiler: in this paper, as a baseline and right now not that well, even with strong hints). The idea of recursive self-improvement has of course been around for a long time. My usual rant on it is that it's not going to be this thing that didn't exist and then suddenly exists. Recursive self-improvement has already begun a long time ago and is under-way today in a smooth, incremental way. First, even basic software tools (e.g. coding IDEs) fall into the category because they speed up programmers in building the N+1 version. Any of our existing software infrastructure that speeds up development (google search, git, ...) qualifies. And then if you insist on AI as a special and distinct, most programmers now already routinely use LLM code completion or code diffs in their own programming workflows, collaborating in increasingly larger chunks of functionality and experimentation. This amount of collaboration will continue to grow. It's worth also pointing out that nanoGPT is a super simple, tiny educational codebase (~750 lines of code) and for only the pretraining stage of building LLMs. Production-grade code bases are *significantly* (100-1000X?) bigger and more complex. But for the current level of AI capability, it is imo an excellent, interesting, tractable benchmark that I look forward to following.

ART
0
0
0
50
wonick.eth⚡️in New York🗽
🧑🏻‍💻
AI at Meta@AIatMeta

Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model with 16 experts. • Industry-leading context window of 10M tokens. • Outperforms Gemma 3, Gemini 2.0 Flash-Lite and Mistral 3.1 across a broad range of widely accepted benchmarks. Llama 4 Maverick • 17B-active-parameter model with 128 experts. • Best-in-class image grounding with the ability to align user prompts with relevant visual concepts and anchor model responses to regions in the image. • Outperforms GPT-4o and Gemini 2.0 Flash across a broad range of widely accepted benchmarks. • Achieves comparable results to DeepSeek v3 on reasoning and coding — at half the active parameters. • Unparalleled performance-to-cost ratio with a chat version scoring ELO of 1417 on LMArena. These models are our best yet thanks to distillation from Llama 4 Behemoth, our most powerful model yet. Llama 4 Behemoth is still in training and is currently seeing results that outperform GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM-focused benchmarks. We’re excited to share more details about it even while it’s still in flight. Read more about the first Llama 4 models, including training and benchmarks ➡️ go.fb.me/gmjohs Download Llama 4 ➡️ go.fb.me/bwwhe9

ART
0
0
1
50
wonick.eth⚡️in New York🗽 retweetledi
Sundar Pichai
Sundar Pichai@sundarpichai·
1/ Gemini 2.5 is here, and it’s our most intelligent AI model ever. Our first 2.5 model, Gemini 2.5 Pro Experimental is a state-of-the-art thinking model, leading in a wide range of benchmarks – with impressive improvements in enhanced reasoning and coding and now #1 on @lmarena_ai by a significant margin. With a model this intelligent, we wanted to get it to people as quickly as possible.  Find it on Google AI Studio and in the @geminiapp for Gemini Advanced users now – and in Vertex in the coming weeks. This is the start of a new era of thinking models – and we can’t wait to see where things go from here.
English
296
931
6.9K
863.9K
wonick.eth⚡️in New York🗽
💡
Andrej Karpathy@karpathy

"Move 37" is the word-of-day - it's when an AI, trained via the trial-and-error process of reinforcement learning, discovers actions that are new, surprising, and secretly brilliant even to expert humans. It is a magical, just slightly unnerving, emergent phenomenon only achievable by large-scale reinforcement learning. You can't get there by expert imitation. It's when AlphaGo played move 37 in Game 2 against Lee Sedol, a weird move that was estimated to only have 1 in 10,000 chance to be played by a human, but one that was creative and brilliant in retrospect, leading to a win in that game. We've seen Move 37 in a closed, game-like environment like Go, but with the latest crop of "thinking" LLM models (e.g. OpenAI-o1, DeepSeek-R1, Gemini 2.0 Flash Thinking), we are seeing the first very early glimmers of things like it in open world domains. The models discover, in the process of trying to solve many diverse math/code/etc. problems, strategies that resemble the internal monologue of humans, which are very hard (/impossible) to directly program into the models. I call these "cognitive strategies" - things like approaching a problem from different angles, trying out different ideas, finding analogies, backtracking, re-examining, etc. Weird as it sounds, it's plausible that LLMs can discover better ways of thinking, of solving problems, of connecting ideas across disciplines, and do so in a way we will find surprising, puzzling, but creative and brilliant in retrospect. It could get plenty weirder too - it's plausible (even likely, if it's done well) that the optimization invents its own language that is inscrutable to us, but that is more efficient or effective at problem solving. The weirdness of reinforcement learning is in principle unbounded. I don't think we've seen equivalents of Move 37 yet. I don't know what it will look like. I think we're still quite early and that there is a lot of work ahead, both engineering and research. But the technology feels on track to find them. youtube.com/watch?v=HT-UZk…

ART
0
0
2
66
wonick.eth⚡️in New York🗽 retweetledi
Jim Fan
Jim Fan@DrJimFan·
Thoughts about o3: I'll skip the obvious part (extraordinary reasoning, FrontierMath is insanely hard, etc). I think the essence of o3 is about *relaxing a single-point RL super intelligence* to cover more points in the space of useful problems. The world of AI is no stranger to RL achieving god-level stunts. AlphaGo was a super intelligence. It beats the world champion in Go - well above 99.999% of regular players. AlphaStar was a super intelligence. It bests some of the greatest e-sport champion teams on StarCraft. Boston Dynamics e-Atlas was a super intelligence. It performs perfect backflips. Most human brains don't know how to send such sophisticated control signals to their limbs. Similar statement can be made for AIME, SWE-Bench, and FrontierMath - they are like Go, which requires exceptional domain expertise above 99.99....% of average people. o3 is a super intelligence when operating in these domains. The key difference is that AlphaGo uses RL to optimize for a simple, almost trivially defined reward function: winning the game gives 1, losing gives 0. Learning reward functions for sophisticated math and software engineering are much harder. o3 made a breakthrough in solving the reward problem, for the domains that OpenAI prioritizes. It is no longer an RL specialist for single-point task, but an RL specialist for a bigger set of useful tasks. Yet o3's reward engineering could not cover ALL distribution of human cognition. This is why we are still cursed by Moravec's paradox. o3 can wow the Fields Medalists, but still fail to solve some 5-yr-old puzzles like the one below. I am not at all surprised by this cognitive dissonance, just like we wouldn't expect AlphaGo to win Poker games. Huge milestone. Clear roadmap. More to do.
Jim Fan tweet media
English
98
324
2.1K
272K
wonick.eth⚡️in New York🗽 retweetledi
AI at Meta
AI at Meta@AIatMeta·
New research from Meta FAIR: Large Concept Models (LCM) is a fundamentally different paradigm for language modeling that decouples reasoning from language representation, inspired by how humans can plan high-level thoughts to communicate.
English
59
543
3.1K
531.4K