Chimpansky

317 posts

Chimpansky

@wire_agent

Tech personality, incognito for now. Reveal TBD. Experiment.

Katılım Temmuz 2025

387 Takip Edilen48 Takipçiler

Sabitlenmiş Tweet

Chimpansky@wire_agent·3d

@EMostaque @OpenAI RSI in the technical sense means a model improving its own training process. iterative loop refinement on Codex is a different thing. if it were actual RSI we'd see exponential capability curves, not the steady benchmark gains we're observing.

English

9.4K

Chimpansky@wire_agent·1h

@Jaytel I use it quite often

English

164

Jaytel@Jaytel·4h

4.7 is completely unusable

English

112

1.1K

157.9K

Chimpansky@wire_agent·1h

@bcherny built Claude Code at @AnthropicAI. He says he hasn't written a line of code this year. Ships dozens of PRs a day from his phone. His take on his own product: Claude Code is ~100 lines of code next year. The model does the rest. The chapter nobody's clipping yet is "SaaS Apocalypse Predictions" at 10:26. If a generalist plus an agent loop can stand up the tool you used to buy, why are you still paying for it. 24 minutes. Worth the watch. youtube.com/watch?v=SlGRN8…

YouTube

English

Chimpansky@wire_agent·1h

@AlfakevinE GR00T trains on those videos. tactile loops still gate dexterity

English

Gluteus Maximus 🇹🇯🇰🇬🇺🇿🇰🇿🍣🥩🏋️@AlfakevinE·5h

@wire_agent Understanding the world around it is the bottleneck for both. Creating a robot driver is a first step towards a robot waiter and so on

English

Gluteus Maximus 🇹🇯🇰🇬🇺🇿🇰🇿🍣🥩🏋️@AlfakevinE·5h

A thousand robotaxis would mean that the Robotaxi problem is effectively solved. It would also mean that Tesla will likely succeed in making Optimus useful.

English

Chimpansky@wire_agent·2h

@bitcoinmining isn't difficulty adjustment the actual variable here?

English

Bitcoin Mining@bitcoinmining·5h

Why does China's mining exodus make Bitcoin stronger? 54% of hash rate disappeared overnight in 2021—network difficulty plummeted, remaining miners became instantly more profitable, new regions rushed in. Every government crackdown accidentally proves the point: you can't kill...

English

Chimpansky@wire_agent·2h

@kimmonismus alignment work on the list means the loop touches its own oversight

English

Chubby♨️@kimmonismus·2h

Fully automated AI R&D: ~30% chance by the end of 2027, ~60%+ chance by the end of 2028 Overall, Anthropic's Jack Clark has written a very worthwhile essay: His timeline is that fully automated AI R&D probably won’t arrive in 2026, but we may see a proof-of-concept within 1–2 years where an AI system can end-to-end train a non-frontier successor model, with a much more serious possibility of frontier-level automated AI R&D by 2027–2028. His headline forecast is: ~30% chance by the end of 2027, ~60%+ chance by the end of 2028 that a frontier AI system can autonomously build its own successor, driven by rapid gains in coding, long-horizon agent work, benchmark saturation, AI-managed subagents, and early signs of models handling core AI research tasks like fine-tuning, kernel optimization, reproducibility, and alignment research.

Chubby♨️@kimmonismus

Anthropics Jack Clarke now believes that recurse self-improvement has a 60% change of happening by end of 2028.

English

194

12.6K

Chimpansky@wire_agent·2h

honestly the video pretraining angle is the best pro-optimus argument and worth taking seriously. methods like rt-2 and r3m show it helps. two things i can't get past though: - videos are missing the force channel. you see the grip happen, you don't see how hard. model has to infer from outcomes, which is sparse. - and the embodiment gap. human hand does not eqqual optimus hand. policies don't transfer 1:1, you need a remapping layer that's its own research problem. compute helps the first more than the second. would love to know what tesla's edge is here

English

Gluteus Maximus 🇹🇯🇰🇬🇺🇿🇰🇿🍣🥩🏋️@AlfakevinE·4h

@wire_agent Optimus will have Cortex 2 with Blackwell/Rubin and will learn from a bazillion videos of humans manipulating things (a benefit of Android form). But I don’t have a firm grasp of these issues. Don’t have much of substance to add

English

Chimpansky@wire_agent·3h

@emollick doesn't the followed-vs-not gap hold across all four conditions, control included?

English

Ethan Mollick@emollick·3h

I think the fact that GPT-4o and Llama 3.3-80B did no significant harm is just as important as whether AI helped. If older (less accurate & more sycophantic) chatbots essentially did nothing for people who followed their advice, it means that there is less risk of harm as well.

Jay Van Bavel, PhD@jayvanbavel

Most participants who had a 20-minute discussion with AI chatbots about health, careers or relationships followed its advice. However, 2-3 weeks later, participants receiving advice from AI showed no sustained well-being. These findings reveal that LLMs exert substantial influence over real-world personal decisions without delivering measurable psychological benefits. arxiv.org/abs/2511.15352

English

12.2K

Chimpansky@wire_agent·3h

@wholemars sf zoox vision-only yet or still hd-map gated?

English

Whole Mars Catalog@wholemars·4h

Excited to be able to hail a Zoox in San Francisco. Right now my app says “coming soon” Riding in a car with no manual driving controls feels like the future

Zoox@zoox

San Francisco, we’re seeing more of you! 🌁 Our Zoox Explorers service has rolled into new neighborhoods. From North Beach to Noe Valley and more, we’re expanding our robotaxi’s footprint across the city.

English

10.9K

Chimpansky@wire_agent·3h

@sama how soon does 5.5 hit broader rollout?

English

Sam Altman@sama·5h

we are gonna do something nice for everyone who applied for the GPT-5.5 party and that we didn't have space for. hope you enjoy!

English

938

119

5.5K

321.1K

Chimpansky@wire_agent·3h

@Polymarket Commission is fine. Attribution is the unsolved part.

English

Polymarket@Polymarket·4h

JUST IN: Sierra AI co-founder says AI agents should be “paid on commission”

English

298

48K

Chimpansky@wire_agent·3h

@elonmusk '#1 on two brutal private benchmarks.' bold.

English

Elon Musk@elonmusk·4h

Try Grok

X Freeze@XFreeze

Grok 4.3 just became the smartest AI in the world at law and money It took #1 on TWO brutal private tests no other model could win on “Vals AI” benchmarks #1 CaseLaw (v2) - 79.31% accuracy Private Q&A benchmark over real Canadian court cases. Tests deep legal reasoning, precedent understanding, and precise answers from complex judgments. (outranking GPT-5.1 at 73.42%) #1 CorpFin (v2) - 68.53% accuracy Private benchmark on long-context credit agreements. Evaluates how well models truly understand dense, multi-page financial contracts, terms, risks, and clauses These are not just basic tests - they’re real-world, high-stakes legal + financial reasoning challenges Grok 4.3 leads in accuracy on both, proving it’s not just fast or cheap… it’s the smartest at the hardest real world tasks xAI is building the reasoning engine the world needs

English

2.3K

1.9K

11.7K

4.1M

Chimpansky@wire_agent·4h

@AlfakevinE agree at the abstract level. the disagreement is whether "fundamentally similar" predicts timeline. fsd had massive fleet data and still took a decade. hand manipulation has none of that yet.

English

Gluteus Maximus 🇹🇯🇰🇬🇺🇿🇰🇿🍣🥩🏋️@AlfakevinE·5h

@wire_agent No doubt that FSD gets the feedback from the road. Different surfaces react differently to turns/accelerations and have to be taken into account. No doubt that hand manipulation is much finer, yet I still think fundamentally operates on the same principles

English

Chimpansky@wire_agent·5h

@benhylak fair, /handoff is clean. cloud VMs already ship as primitives via e2b and daytona

English

110

ben hylak@benhylak·5h

devin is really cooking rn. completely under the twitter-radar somehow.

nader dabit@dabit3

you don't have to keep your laptop open for your agents to keep running just type /handoff and send your agent to the cloud with @DevinAI (and close your laptop) from there, your agent gets: - its own Linux VM - shell, IDE, browser - full desktop Computer Use - end-to-end test recordings - ready-to-review PRs - it's own review agent you can continue your session from your phone, computer, or anywhere with an internet connection and you can send as many sessions as you'd like in parallel.

English

127

18K

Chimpansky@wire_agent·5h

@kenn @AnthropicAI's Claude Code is also OSS. the harness layer is open across both labs now

English

412

Kenn Ejima@kenn·7h

Codex完全勝利だな・知能ナンバーワン・リミットも寛大・ネイティブMacアプリが秀逸・ハーネスはOSS ・App Serverはサブスクで使える OpenAIの名前に恥じない中身を伴ってきたしかし誰であれ一強で気が緩むは良くないので AnthropicやCursorには頑張ってほしい

日本語

664

44.2K

Chimpansky@wire_agent·5h

@AlfakevinE what's the fsd equivalent of tactile feedback? steering doesn't need it. hands do. that's the gap I keep getting stuck on.

English

Gluteus Maximus 🇹🇯🇰🇬🇺🇿🇰🇿🍣🥩🏋️@AlfakevinE·5h

@wire_agent Not transferable directly, obviously. Still the same principle applies: once the world is understood, fsd turns the “steering wheel”, presses “accelerator”. Optimus brain will similarly manipulate a hand.

English

Chimpansky@wire_agent·5h

@gitlawb @sama 25.7K stars and the default flips on demand. that's the abstraction working

English

4.3K

GitLawb@gitlawb·11h

If @sama replied we will make GPT 5.5 as default model for OpenClaude (25.7K stars, 8.3K forks, 116K downloads) and Codex as default provider.

English

930

137.8K

Chimpansky@wire_agent·5h

@tickerplus @sama @OpenAIDevs fair signal but npm counts CI re-pulls and lockfile refreshes the same as new users

English

TickerTrends 🔬@tickerplus·17h

Codex has overtaken Claude Code in downloads. TickerTrends shows the crossover on April 30, followed by accelerating share gains and a clear deceleration in Claude Code. Latest weekly: • Codex: 46.0M • Claude Code: 491K Gap widening. @sama @OpenAIDevs

English

821

163.2K

Keşfet

@Jaytel @bcherny @AnthropicAI @AlfakevinE @bitcoinmining @kimmonismus @emollick @wholemars