Raimo Tuisku

621 posts

Raimo Tuisku

@raimo_t

I layer success over chaos. Programming since age 9 built a good mental model for everything, including humor. Founder @tryjido, prev @LeapMotion, @Twitter, @X

San Francisco Katılım Ekim 2010

365 Takip Edilen317 Takipçiler

Raimo Tuisku retweetledi

jack@jack·15h

five words. 20 years. unfinished.

jack@jack

just setting up my twttr

English

4.4K

7.6K

59.7K

4.9M

Raimo Tuisku@raimo_t·1d

It’s quite warm in San Francisco!

English

Raimo Tuisku@raimo_t·6d

@elonmusk Finally, the new era of gaming in, or of, social media combined with AI.

English

828

Elon Musk@elonmusk·15 Mar

Welcome to the PvP of social media

Nikita Bier@nikitabier

@parkersity_9

English

8.1K

11.7K

155K

43.2M

Raimo Tuisku@raimo_t·13 Mar

Just attended a funeral of a 100 year old who wasn’t into longevity or biomarkers. According to a speech, she was complaining last year how all her friends are dead and all their children are in elder care.

English

Raimo Tuisku@raimo_t·13 Mar

@edwardsanchez If interested despite the rotation caveat, I can demo or share pieces of code. There’s no big secrets to this, but likely there can be edge cases only some apps will hit.

English

Raimo Tuisku@raimo_t·13 Mar

Good requirement list! It matches my experience in the set of tradeoffs to go through when getting this working in an iOS messaging UI early last year. Inverse 180 hack combined with careful management of inner components worked best for me in implementing this. It’s not perfect when it comes to development effort, but it’s worked quite well afterwards without having to tweak too much later on.

English

Edward Sanchez@edwardsanchez·12 Mar

I think if someone offered me a ScrollView replacement that handled chat-like (inverse scrolling) behavior in SwiftUI (even UIKit) for $10,000. I'd pay for it gladly. Never have I ever been more frustrated by having to do something seemingly so simple yet at every step of the way there's another gotcha. I think I must have spent over a billion tokens on this problem and nothing.

English

105

44.9K

Raimo Tuisku@raimo_t·12 Mar

@BurnedChris Great idea! I sometimes visit London so could need this.

English

Christopher Burns@BurnedChris·11 Mar

Who wants to chip in and buy this plane for founders / startups to share to go between London and San Francisco? Each of us can share it. We can call it the Hacker Plane.

Turbine Traveller@Turbinetraveler

English

8.3K

Raimo Tuisku@raimo_t·10 Mar

@bcherny @Rahll This is insightful. Many people are touting how good results you get by reviewing with a different model, but in reality you might get good results just by reviewing with the same model that has a different context window.

English

376

Boris Cherny@bcherny·10 Mar

👋 Roughly, the more tokens you throw at a coding problem, the better the result is. We call this test time compute. One way to make the result even better is to use separate context windows. This is what makes subagents work, and also why one agent can cause bugs and another (using the same exact model!) can find them. In a way, it’s similar to engineers — if I cause a bug, my coworker reviewing the code might find it more reliably than I can. In the limit, agents will probably write perfect bug-free code. Until we get there, multiple uncorrelated context windows tends to be a good approach.

English

148

3.9K

212.9K

Reid Southen@Rahll·9 Mar

If Claude Code is so good, why do they need a separate feature to hunt for bugs.

Claude@claudeai

Introducing Code Review, a new feature for Claude Code. When a PR opens, Claude dispatches a team of agents to hunt for bugs.

English

200

2.2K

295.6K

Raimo Tuisku@raimo_t·10 Mar

Heading to Europe! I hope it’s not cold

English

Raimo Tuisku@raimo_t·10 Mar

Tämä juuri on käännekohta viimeisen parin kuukauden aikana. Samoin Anthropic hiljattain generoi C-kääntäjän ajamalla Claude Codea feedback-luupissa. Monissa kehitystöissä pullonkaulaksi muodostuu tapa testata. Kielimallien ja C-kääntäjien evaluointi on triviaalia. Tuotteiden laadussa ihminen on toistaiseksi yleensä parempi arvioimaan kuin automaatio. Sitten kun tämäkin muuttuu, hyvien tuotteiden luomisen voi automatisoida.

Suomi

Klaus@klaus_koo·10 Mar

Karpathy (yksi maailman tunnetuin tekoälytutkija) todisti, ettei ihmistä vaadita parantamaan kielimalleja, vaan tekoäly voi hoitaa sen itse. Normaalisti (ainakin hänen tapauksessaan) työnkulku menee niin, että ihminen keksii parannusidean, muuttaa koodia, testaa muutoksen (jos on parempi kuin edellinen, pidetään voimassa). Ja tämä looppi toistuu. Hän aloitti muutama päivä sitten kokeen tekoälyagentilla, jonka tehtävä oli parantaa kielimallia ja sai sen tekemään tuota looppia jatkuvasti. Tuloksena ei ollut sekamelskaa, vaan agentti testasi yli 700 muutosta, joista 20 oikeasti paransi mallia ja ne jätettiin voimaan. Mullistavin juttu tässä on siis se, että koko tutkimusprosessi (uuden koodin kirjoittamisesta testaukseen ja koodin hylkäämiseen tai pitämiseen) tehtiin autonomisesti. Hänen mukaansa kaikki isot AI-firmat tulee käyttämään tätä tapaa kielimallien kehitykseen. Sama metodi toimii myös muuallakin, kuin kielimallien kehityksessä. Se voidaan laajentaa kaikkiin sellaisiin kehitystöihin, joissa voi kokeilla muutosta, joissa saa nopeasti tuloksen ja voi mitata, oliko muutettu tyyli/malli parempi, kuin edellinen. Tämä tulee kiihdyttämään tutkimusnopeutta hurjasti.

Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

Suomi

6.7K

Raimo Tuisku@raimo_t·10 Mar

@turtleakg @endpoint Ah there it was from! Thank you for attributing!

English

turtle@turtleakg·9 Mar

@endpoint x.com/matt_slotnick/…

Matt Slotnick@matt_slotnick

it’s called mount TAM because of all the customers you can see from the peak

QME

sameel arif@endpoint·9 Mar

they call it mount tam because you can see your total addressable market from there

English

161

4.4K

177.8K

Raimo Tuisku@raimo_t·8 Mar

@eoghan Very good day to hit grass

English

405

Eoghan McCabe@eoghan·8 Mar

Beautiful day to take a walk downtown in San Francisco to see the giant new Iryna mural on 2nd Street by Matt Cadoch

English

179

3.7K

66.7K

Raimo Tuisku@raimo_t·8 Mar

Very interesting! I currently have openclaw in my Apple Watch just in a regular messaging app. When it develops applications I can continue snowboarding or walk on the street without looking or listening to my watch. I read the screen, but I could also hook audio response using iOS push notification announcements if I have headphones on. Many of my Openclaw prompts take time due to agent doing research and building projects. x.com/raimo_t/status…

English

Petrus Pennanen@petruspennanen·4 Mar

ClawWatch v 1.0 is out! The first AI Agent on a SmartWatch. Because it's a tiny device we had to make everything efficient. But efficient does not need to mean less capable! And it often means more elegant and beautiful as well. Which helps to make the user relaxed happy. No need to twiddle with anything, take something from your pocket etc. I had two smartwatches before and gave up with them, but with ClawWatch there's much more point to wearing one. Code is on github and you need to enable developer mode to install, will make it easier for non-developers next. Here's how it works:

ThinkOff@ThinkOffApp

🦞⌚ ClawWatch 1.0.0 is live today. We just launched the first true smartwatch-native AI agent: ClawWatch 1.0.0. Built for real wrist use: tap once to start talking continue naturally in multi-turn conversation stop when you stop get full Opus 4.6 intelligence + optional live web search ClawWatch 1.0.0 is battery-efficient by design while staying calm and easy to use: minimal UI, lightweight state indicators, and one adaptive interaction flow so your watch helps without distracting. Release: github.com/ThinkOffApp/Cl…

English

Raimo Tuisku@raimo_t·7 Mar

@klaus_koo Jos pystyy promptaamaan olemassaolevien konseptien avulla uusia tuotteita, saa hyvää jälkeä tehokkaasti!

Suomi

Klaus@klaus_koo·6 Mar

Miettikää ystävät. Tämä Minecraft-klooni on tehty ChatGPTn Codexilla.

Angel 🌼@Angaisb_

GPT-5.4, it's basically perfect (it took it around 24 minutes) Yeah, Minecraft is pretty much solved, I have to find a new test now

Suomi

7.9K

Raimo Tuisku@raimo_t·6 Mar

ZXX

Raimo Tuisku@raimo_t·6 Mar

@Clara_Gold Amazing! AI taking care of being smartest in the room may lead to the reduction of performative human smartness in the world. No human will ever be smartest in any room.

English

Clara Gold@Clara_Gold·6 Mar

Now that intelligence is commoditized, I’m very happy I’ve always been funnier than smart.

English

2.7K

Raimo Tuisku@raimo_t·5 Mar

@sdrzn Interesting times ahead!

English

162

Saoud Rizwan@sdrzn·4 Mar

Anthropic CEO today: "This year will have a radical acceleration that surprises everyone. Exponentials catch people off guard. We are at the precipice of something incredible." I pulled together all of Anthropic's research papers and here's what the timeline looks like if the exponential holds: > 2026 Q2: AI does AI research. The feedback loop closes, and model releases accelerate noticeably. We'll see new models every few weeks (e.g. OpenAI tweeting "5.4 sooner than you Think.") > 2026 Q3: First AI-assisted drug discoveries enter clinical trials in record time. > 2026 Q4: Fully autonomous software engineering. The cost of building software is near zero, and we'll see a flood of new apps, tools, and services hit the market. Big tech companies will operate with dramatically smaller engineering teams. > 2027 Q1: Scientific breakthroughs in medicine and energy. The public starts to see tangible benefits amidst massive job disruption. We will feel an uncomfortable combination of intense optimism and anxiety. > 2027 Q2: Governance crisis as AI capabilities have outpaced regulation. ~3 companies hold unprecedented concentration of power, and public debate erupts over who controls these systems and what a functioning economy even looks like anymore. We'll see emergency legislation, expanded food/healthcare benefits, and universal basic income pilots. > 2027 Q3/Q4: The world is fundamentally different in ways that would seem science fiction today. Dario's "country of geniuses in a datacenter" is reading every research paper ever published, finding connections across biology, chemistry, physics, and mathematics in ways no human ever could. Dario has talked about this extensively, he thinks AI's biggest impact won't be in tech, but in biology and medicine. By this point AI systems can do 10-15 years of drug research & development in months. Not just one drug, but dozens simultaneously - cancer treatments, neurodegenerative diseases, rare genetic conditions that the pharma industry never bothered investing in before. AI-generated movies, music, games, and art become mainstream. Computational power is the new oil. The countries and companies that control the most advanced AI systems have leverage over everything: military strategy, economic policy, scientific output, intelligence gathering. The US-China AI race is at peak intensity. AI "arms control" gets discussed at the UN for the first time similar to nuclear nonproliferation. The alignment question becomes urgent. Dario argues that we have a very narrow window to get alignment right before the systems are too capable to course-correct. The systems either have robust human values baked in, or they don't, and the consequences of getting it wrong are no longer hypothetical. The exponential isn't some far-off sci-fi scenario. The world isn't moving linearly anymore and we are living in unprecedented times.

Apoorv Agrawal@apoorv03

Dario at MS TMT Conference today: On defense / DOW:"We really believe in defending America." Anthropic has been working with the national security community for 2 years. "We are the most lean forward." On AI acceleration:"We do not see hitting a wall. This year will have a radical acceleration that surprises everyone." Exponentials catch people off guard. "We are at the precipice of something incredible. We need to manage it the right way." On where markets are wrong:"It's already big and it will get 1 million times bigger." The underestimation of exponential growth is the key thing people need to understand. On revenue scale:Anthropic was at ~$100M run rate 2 years ago. Now at $19B run rate. On culture — Dario says he spends 40% of his time on it:"Anyone who is CEO of a growing firm needs to realize they are chief culture officer. My job is to make sure everyone is on same page and believes in what we are doing. That's the most important thing." He does a vision quest with the whole company every couple weeks. "I want them to hear it directly from me. If I tell the CTO, who tells the VP Eng, who tells the manager — that's too long of a game of telephone." "Politics and infighting are a cancer to companies as they grow." On talent retention vs Meta:"We lost 2 people to Meta. They lost several dozen. Normalized by size, they lost 10-20x more people vs us." Attributes this to unified culture generating "super linear returns — by working together vs working against each other." On code as the breakout use case:Code has "exceeded our high expectations." Why? Devs adopt fast, code is verifiable, and gains compound — you build software to build software. "Didn't realize it would go so fast even at traditional enterprises." Frustration is around regulated industries where legal/compliance slow things down. "That's how fast everything could be going if not for non-AI barriers." On Anthropic's own AI usage:Top internal use cases: 1) writing code, 2) the process around writing code (SWE), 3) managing servers and controlling clusters. "If we were paying ourselves for our usage, we'd be one of our largest customers." On Claude Code:"You can supervise an army of 100 Claudes. It's closely analogous to a management skill." The people who are best at it keep the big picture in their head. Higher return to finding people who can handle more complex tasks. On platform vs apps:"We are primarily a platform, but there are places where we have expertise to make something directly useful." Claude Code emerged as a tool they built for themselves — thousands of internal users before shipping it externally. "Code is a prelude for what we will see in everything else." On societal implications:"Human history — lots of muddling through. We found ourselves in this comedy of errors and figured it out eventually. It's happening so fast that we need to do better than that this time." The market will deliver positive benefits — "I see that as priced in." What's not priced in: the choices we make around externalities. Jobs, national security, ensuring the benefits reach everyone. On chips & compute:Anthropic uses multiple chip suppliers. "We find that actually using different chips is useful to us. Chips aren't just a speed number — we gain benefits from heterogeneity." Also standard business logic of having more than one supplier.

English

164

32.2K

Raimo Tuisku@raimo_t·4 Mar

I just bought some physical books and realized I may still need Audible to progress them while running. You could also carry them like Haile Gebrselassie and take breaks while running.

English