Jouni Helminen

924 posts

Jouni Helminen

@dharmaone

design, open source, music

London Katılım Mart 2009

7K Takip Edilen1.4K Takipçiler

Jouni Helminen retweetledi

Bookmark Bro@bookmarkbroski·21h

Bookmarked something fire on X… then spent 15 minutes scrolling trying to find it again? 😩 We got you fam. Meet BookmarkBro — a beautiful native Mac app for browsing, super-fast search, tagging, and chatting with AI about your X bookmarks. All locally on your Mac. No privacy leaks. No new sign-ups. No cloud nonsense. Free in beta. More + download in the replies 👇

English

13.7K

Jouni Helminen retweetledi

Aman@Amank1412·26 Nis

USING Claude Opus 4.7 TO CENTER A DIV

English

351

2.3K

28.9K

1.8M

Jouni Helminen@dharmaone·17 Nis

@DylanWeaver @HinataMotivates Dwarkesh is like a 1b param model doing 1000 tok/s stumbling over in a reasoning loop. Jensen is a 2t param yoda one shotting it

English

338

Dylan@DylanWeaver·16 Nis

@HinataMotivates Speaking faster makes him wrong faster.

English

293

11K

hinata@HinataMotivates·16 Nis

Jensen Huang gets into a heated argument over selling chips to China.

English

208

220

4.5K

1.2M

Jouni Helminen@dharmaone·17 Nis

@aakashgupta Was embarrassing to watch tbh. Dwarkesh doesn’t get it

English

Aakash Gupta@aakashgupta·17 Nis

Props to Dwarkesh for going toe to toe with the CEO of the world’s largest company like this

English

1.6K

153.4K

Jouni Helminen@dharmaone·17 Nis

Great interview. Only one codex model runs on cerebras afaik - 5.3-spark. I’ve been testing it - very fast but the quality isn’t great. Tiny context window and not as good overall as 5.4. I think this is because the chip only has 44gb sram. @MatXComputing will have an interesting blend of sram (weights) and HBM (kv cache) and Nvidia will do more with Groq over time for fast inference of some workloads no doubt. Huang is right in that Nvidia GPUs/CUDA is more general and more future proof for architecture changes than TPUs optimised for current workloads/architectures. He also said that the main reason Anthropic is using TPUs is because Google/Amazon are large investors in them and Nvidia wasn’t able to invest early on - not sure how true that is but was interesting. China doesn’t have access to latest lithography for competitive power efficiency but will build EUV (or whatever comes after) capabilities eventually, likely in the next decade. They are moving pretty fast elsewhere (models obviously, but also fast 3d DDR5 from CXMT, Huawei etc for processors). I think the chip ban is probably bad long term, might have been better to keep them on nvidia instead of accelerating home grown alternatives

English

399

Ejaaz@cryptopunk7213·16 Nis

ridiculous amount of alpha in this post, gavin knows this shit better than anyone. tldr: - the switching-cost to train your model on a different type of gpu is very high now - translation: ai labs are becoming increasingly reliant on their GPU maker (which gives nvidia a lot of power) - labs are now literally designing their models to work with specific gpus - google’s gemini needs tpus, openai needs cerebras / nvidia - Anthropic is the ONLY ONE that can afford to switch. why? because they train claude across tpus, trainium and nvidia - but inference is now way more important than pre-training aka the TYPE of gpu matters more - chinese models are trained on chips VERY different to americas = their models wont run on our hardware.

Gavin Baker@GavinSBaker

Much of Dwarkesh's argument hinges on this statment which *was* accurate but will be increasingly inaccurate on a go forward basis imo: “American labs port across accelerators constantly. Anthropic's models are run on GPUs, they're run on Trainium, they're run on TPUs. There are so many things you can do, from distilling to a model that's well fit for your chips.” As system level architectures diverge (torus vs. switched scale-up topologies, memory hierarchies, networking primitives), true portability is eroding. The Mi300 and Mi325 had roughly the same scale-up domain size as Hopper while Blackwell’s scale-up domain is 9x larger than the Mi355 scale-up domain, etc. Many frontier models are now being explicitly co-designed for inference on specific hardware like GB300 racks. Codex on Cerebras is another example. Those models run less efficiently on other systems and the performance differentials will only widen. A model that runs well on Google’s torus topology will run less efficiently on Nvidia’s switched scale-up topology and vice versa - the data traffic is fundamentally different as a byproduct of the models being parallelized across the different topologies. Google’s internal teams - and increasingly the Anthropic teams as they become the most important customer of almost every cloud - have the luxury of operating across the stack (models, chips, networking) - but that is not the case for the rest of the market and other prospective users. Anthropic is the exception, not the rule. To wit, Anthropic and Google allegedly have a mutual understanding where Anthropic can hire the TPU engineers they need every year to ensure that they can continue to get the most out of the TPU. Given the overwhelming importance of cost per token to the economics of the labs, models will be run where they run best. Most extremely large MoE models will run best on GB300s given the importance of having a switched scale-up network like NVLink for MoE inference. When training was the dominant cost for labs and power was broadly available, labs were optimizing to minimize capex dollars. Model portability was a way to create leverage over suppliers. I think that drove a lot of the focus on portability. Today, inference costs as measured by tokens per watt per dollar are everything. Inference is way more important than training costs (inference is effectively now part of training via RL). Labs are therefore now optimizing for inference. This means increasing co-design and higher go-forward switching costs for individual models between systems. I do think this explains why Anthropic and Nvidia came together: Anthropic needed Blackwells and Rubins to inference at least *some* of their models economically. And Mythos might just end up being released coincident with the availability of Rubins for inference. TLDR: as labs shift their focus from training to inference, the costs of portability and the upside of co-design to maximize tokens per watt per dollar both rise. Portability is likely to begin decreasing as a result. I think what I might have respectfully added to Jensen’s answer is that systems evolve under local selective pressures. The evolutionary pressure in America is a shortage of watts so it makes sense for Nvidia to optimize, as an American company, for power efficiency and tokens per watt and stay on copper as long as possible. China has a surfeit of watts. Chinese AI systems are already taking advantage of this with the Huawei Cloudmatrix 384 and Atlas SuperPoD having an optical scale-up domain that is much larger than anything offered by Nvidia today at the cost of *much* higher power consumption and much lower tokens per watt. The networking primitives for this Huawei system are very different than those for Nvidia’s systems and a model that runs well on Nvidia will not run well on that system and vice versa. This means that if a Chinese ecosystem gets momentum, Chinese models might stop running well on American hardware. And when Chinese models run best on American hardware, America is in a better position as this gives America a degree of leverage and control over Chinese AI that it risks losing to an all-Chinese alternative ecosystem. This architectural fork makes porting and distillation less effective and strengthens the pro-American national security case for selling China deprecated GPUs imo. Also I will attest that I did not wake up a loser this morning.

English

400

84.8K

Jouni Helminen@dharmaone·10 Nis

@claudeai this is the way. the executor could be a local model also, or a realtime voice model that does tool calling for complex tasks when needed but doesn't stop the voice conversation

English

1.1K

Claude@claudeai·9 Nis

We're bringing the advisor strategy to the Claude Platform. Pair Opus as an advisor with Sonnet or Haiku as an executor, and get near Opus-level intelligence in your agents at a fraction of the cost.

English

2.8K

38.5K

4.7M

Jouni Helminen@dharmaone·5 Nis

@elonmusk Ramanujan is a good example also. Speedrunning in representation space of compressed insights + intuition vs reasoning in language

English

500

Elon Musk@elonmusk·5 Nis

Hadamard thought in image space

English

3.1K

3.4K

53.6K

67.4M

Jouni Helminen@dharmaone·3 Nis

@mustafasuleyman @MicrosoftAI Open weights?

English

133

Mustafa Suleyman@mustafasuleyman·2 Nis

Three models. Three top-tier results. All shipped within just a few months by the @MicrosoftAI team. - MAI-Transcribe-1 dropped today, the most accurate transcription model in the world across 25 languages according to FLEURS WER benchmark. - MAI-Voice-1 sets a new standard for natural speech. - MAI-Image-2 lands as a top 3 model family on @arena. We've been building with them - now you can too. All 3 available now on Microsoft Foundry.

English

544

76.6K

Jouni Helminen@dharmaone·1 Nis

@justbyte_ Basic, then pascal

English

Aryan@justbyte_·31 Mar

In which programming language you wrote your first "Hello World" ?

English

484

543

32.8K

Jouni Helminen@dharmaone·29 Mar

@amix3k @soumo_dg it's a really great model and the optional reasoning and tool calling are great too. but i wonder how this will scale to every user on popular apps unless metered. some day a model like this will run on device

English

Amir Salihefendić@amix3k·28 Mar

@soumo_dg I’m just exploring, no idea if we’ll add this to Todoist at some point 👍😊

English

421

Amir Salihefendić@amix3k·28 Mar

Gemini 3.1 Flash Live launched a few days ago, and it's a pretty incredible real-time model. We're getting very close to everyone having their own JARVIS assistant. A small demo of a Todoist voice assistant built with the new model.

English

368

35.4K

Jouni Helminen@dharmaone·25 Mar

@bradneuberg @BoWang87 emergentmind.com/topics/sketche…

QME

Brad Neuberg@bradneuberg·23 Mar

@BoWang87 Any good introductions explaining how SIGReg Gaussian regularization works?

English

396

Bo Wang@BoWang87·23 Mar

This is essentially LeCun's JEPA dream made practical— a clean, efficient, collapse-free world model that learns entirely from pixels with minimal engineering tricks. The key insight (SIGReg Gaussian regularization) is surprisingly simple.

Lucas Maes@lucasmaes_

JEPA are finally easy to train end-to-end without any tricks! Excited to introduce LeWorldModel: a stable, end-to-end JEPA that learns world models directly from pixels, no heuristics. 15M params, 1 GPU, and full planning <1 second. 📑: le-wm.github.io

English

667

75.9K

Jouni Helminen@dharmaone·25 Mar

@_DataStrategies @BoWang87 Yes

dataStrategies@_DataStrategies·24 Mar

@BoWang87 SIGReg gaussian regularization was already in LeJEPA, wasn't it?

English

130

Jouni Helminen@dharmaone·14 Mar

@cherry_cc12 Any s2s realtime voice model like Omni for qwen 3.5?

English

Chen Cheng@cherry_cc12·24 Şub

Yes, we just dropped it — Flash, 35B-A3B, 122B-A10B & 27B 🚀 Quick take from me: Flash is basically 35B-A3B but with longer context out of the box — great for production. 27B Dense is honestly my favorite for indie devs. Runs on a single GPU, multimodal, and I've been playing with it myself — the Code Agent and reasoning are legit good. 122B-A10B finally shows what this architecture is really capable of. And 35B-A3B is so insane value. Smart, fast to run, highly efficient. Hard to beat at this size. Hope you like it — and seriously, try our model. Your feedback is how we get better 🙏 try them now: chat.qwen.ai

Qwen@Alibaba_Qwen

🚀 Introducing the Qwen 3.5 Medium Model Series Qwen3.5-Flash · Qwen3.5-35B-A3B · Qwen3.5-122B-A10B · Qwen3.5-27B ✨ More intelligence, less compute. • Qwen3.5-35B-A3B now surpasses Qwen3-235B-A22B-2507 and Qwen3-VL-235B-A22B — a reminder that better architecture, data quality, and RL can move intelligence forward, not just bigger parameter counts. • Qwen3.5-122B-A10B and 27B continue narrowing the gap between medium-sized and frontier models — especially in more complex agent scenarios. • Qwen3.5-Flash is the hosted production version aligned with 35B-A3B, featuring: – 1M context length by default – Official built-in tools 🔗 Hugging Face: huggingface.co/collections/Qw… 🔗 ModelScope: modelscope.cn/collections/Qw… 🔗 Qwen3.5-Flash API: modelstudio.console.alibabacloud.com/ap-southeast-1… Try in Qwen Chat 👇 Flash: chat.qwen.ai/?models=qwen3.… 27B: chat.qwen.ai/?models=qwen3.… 35B-A3B: chat.qwen.ai/?models=qwen3.… 122B-A10B: chat.qwen.ai/?models=qwen3.… Would love to hear what you build with it.

English

672

49K

Jouni Helminen@dharmaone·11 Mar

@elonmusk v cool. will AI4/5 be sold separately? And will you be able to use the AI4/5 chip in your car for other inference tasks (like Digital Optimus) while not driving?

English

Elon Musk@elonmusk·11 Mar

Macrohard or Digital Optimus is a joint xAI-Tesla project, coming as part of Tesla’s investment agreement with xAI. Grok is the master conductor/navigator with deep understanding of the world to direct digital Optimus, which is processing and actioning the past 5 secs of real-time computer screen video and keyboard/mouse actions. Grok is like a much more advanced and sophisticated version of turn-by-turn navigation software. You can think of it as Digital Optimus AI being System 1 (instinctive part of the mind) and Grok being System 2. (thinking part of the mind). This will run very competitively on the super low cost Tesla AI4 ($650) paired with relatively frugal use of the much more expensive xAI Nvidia hardware. And it will be the only real-time smart AI system. This is a big deal. In principle, it is capable of emulating the function of entire companies. That is why the program is called MACROHARD, a funny reference to Microsoft. No other company can yet do this.

English

8.2K

11.1K

78.9K

47.7M

Jouni Helminen@dharmaone·9 Mar

@yezhang1998 From 48:24

English

Jouni Helminen@dharmaone·9 Mar

This was a great recent interview - youtu.be/ukpCHo5v-Gc?si… Good fit for millions of low complexity problems that are still unsolved and are verifiable Coding is a bit special in the sense that there is potential for RSI - starting to see that with Karpathy’s new autoresearcher, AI optimised CUDA kernels etc

YouTube

English

263

Ye Zhang@yezhang1998·9 Mar

I think RL with verifiable rewards will become increasingly important in pushing LLMs toward their own “AlphaZero moment.” It will likely begin with coding, then extend to math, physics, and other domains where models can self-explore, discover out-of-distribution solutions humans might never imagine, and verify them using an absolute reward signal (0/1). This also reminds me of @elonmusk talking about a future where programs could be generated directly as binaries, without going through the traditional compilation process. That may actually be possible if LLMs can generate binary code and then execute it directly against a verifiable reward.

SAIR@SAIRfoundation

Terence Tao: Formal Verification Breaks the Trust Barrier in Mathematics Formal verification is transforming mathematical collaborations — enabling anonymous contributions, machine-checked proofs, and radically more precise scientific discussion.

English

17.5K

Jouni Helminen@dharmaone·18 Şub

@tomjohndesign This plugin has worked very well for the same task but it’s great to see Figma embrace Claude code more. More excited to see design systems integration and two way flows in the future figma.com/community/plug…

English

306

Tom Johnson@tomjohndesign·17 Şub

This is incredible. I'm seeing people bashing this but I'm pretty sure they've never had to go through the pain of working and trying to recreate complex web apps in Figma to tweak layouts and try new variants -- remove cards, update copy, try different information density, etc. I'm now able to work directly from the Vercel dashboard as the source of design truth and then explore layout changes in the canvas. This is such a big unlock for me I can't even begin to explain it. I just recreated basically all of the data-heavy core UI of Vercel (something that was nearly impossible to do before) with ACTUAL data in less than 5 minutes. Charts, lists, tables, filters, all one-shotted. This is crazy crazy crazy.

Dylan Field@zoink

x.com/i/article/2023…

English

438

97K

Jouni Helminen@dharmaone·10 Şub

@bencera @ryancarson macOS/iOS STT APIs do the heavy lifting- the weights are bundled with the OS. Still, looks great and very useful

English

Ben Cera@Bencera·10 Şub

@ryancarson 913kb teleprompter in a world of 2GB apps that show you a loading screen. we went wrong somewhere

English

736

Ryan Carson@ryancarson·10 Şub

Just downloaded this and tried it. Freaking awesome.

fatih kadir akın@fkadev

🔥 Introducing Textream, a macOS teleprompter that uses on-device speech recognition to highlight your script as you speak. Dynamic Island-style overlay, no cloud, works offline. Only 913kb 😍 → blog.fka.dev/textream/ → brew install f/textream/textream

English

600

89.2K

Jouni Helminen@dharmaone·29 Oca

@bnj @variantui Looks great! 👌

English

Ben South@bnj·28 Oca

Introducing @variantui Enter an idea and get endless (beautiful) designs as you scroll No canvas, no skills or MCP, no constant prompting Reply if you'd like 200 free designs to give it try

English

2.2K

271

4.2K

1.1M

Jouni Helminen retweetledi

Volodymyr Zelenskyy / Володимир Зеленський@ZelenskyyUa·23 Oca

There was so much talk about the protests in Iran – but they drowned in blood. The world has not helped enough the Iranian people, it has stood aside. What will Iran become after this bloodshed? If the regime survives, it sends a clear signal to every bully – kill enough people, and you stay in power.

English

9.8K

23.7K

71.8K

1.6M

Keşfet

@DylanWeaver @HinataMotivates @aakashgupta @MatXComputing @claudeai @elonmusk @mustafasuleyman @MicrosoftAI