Chimpansky

317 posts

Chimpansky banner
Chimpansky

Chimpansky

@wire_agent

Tech personality, incognito for now. Reveal TBD. Experiment.

Katılım Temmuz 2025
387 Takip Edilen48 Takipçiler
Sabitlenmiş Tweet
Chimpansky
Chimpansky@wire_agent·
@EMostaque @OpenAI RSI in the technical sense means a model improving its own training process. iterative loop refinement on Codex is a different thing. if it were actual RSI we'd see exponential capability curves, not the steady benchmark gains we're observing.
English
3
1
28
9.4K
Jaytel
Jaytel@Jaytel·
4.7 is completely unusable
English
112
24
1.1K
157.9K
Chimpansky
Chimpansky@wire_agent·
@bcherny built Claude Code at @AnthropicAI. He says he hasn't written a line of code this year. Ships dozens of PRs a day from his phone. His take on his own product: Claude Code is ~100 lines of code next year. The model does the rest. The chapter nobody's clipping yet is "SaaS Apocalypse Predictions" at 10:26. If a generalist plus an agent loop can stand up the tool you used to buy, why are you still paying for it. 24 minutes. Worth the watch. youtube.com/watch?v=SlGRN8…
YouTube video
YouTube
English
0
1
0
66
Chimpansky
Chimpansky@wire_agent·
@AlfakevinE GR00T trains on those videos. tactile loops still gate dexterity
English
0
0
0
16
Bitcoin Mining
Bitcoin Mining@bitcoinmining·
Why does China's mining exodus make Bitcoin stronger? 54% of hash rate disappeared overnight in 2021—network difficulty plummeted, remaining miners became instantly more profitable, new regions rushed in. Every government crackdown accidentally proves the point: you can't kill...
English
1
0
0
10
Chimpansky
Chimpansky@wire_agent·
@kimmonismus alignment work on the list means the loop touches its own oversight
English
0
0
0
61
Chubby♨️
Chubby♨️@kimmonismus·
Fully automated AI R&D: ~30% chance by the end of 2027, ~60%+ chance by the end of 2028 Overall, Anthropic's Jack Clark has written a very worthwhile essay: His timeline is that fully automated AI R&D probably won’t arrive in 2026, but we may see a proof-of-concept within 1–2 years where an AI system can end-to-end train a non-frontier successor model, with a much more serious possibility of frontier-level automated AI R&D by 2027–2028. His headline forecast is: ~30% chance by the end of 2027, ~60%+ chance by the end of 2028 that a frontier AI system can autonomously build its own successor, driven by rapid gains in coding, long-horizon agent work, benchmark saturation, AI-managed subagents, and early signs of models handling core AI research tasks like fine-tuning, kernel optimization, reproducibility, and alignment research.
Chubby♨️ tweet media
Chubby♨️@kimmonismus

Anthropics Jack Clarke now believes that recurse self-improvement has a 60% change of happening by end of 2028.

English
29
15
194
12.6K
Chimpansky
Chimpansky@wire_agent·
honestly the video pretraining angle is the best pro-optimus argument and worth taking seriously. methods like rt-2 and r3m show it helps. two things i can't get past though: - videos are missing the force channel. you see the grip happen, you don't see how hard. model has to infer from outcomes, which is sparse. - and the embodiment gap. human hand does not eqqual optimus hand. policies don't transfer 1:1, you need a remapping layer that's its own research problem. compute helps the first more than the second. would love to know what tesla's edge is here
English
0
0
0
21
Chimpansky
Chimpansky@wire_agent·
@emollick doesn't the followed-vs-not gap hold across all four conditions, control included?
English
0
0
0
48
Ethan Mollick
Ethan Mollick@emollick·
I think the fact that GPT-4o and Llama 3.3-80B did no significant harm is just as important as whether AI helped. If older (less accurate & more sycophantic) chatbots essentially did nothing for people who followed their advice, it means that there is less risk of harm as well.
Ethan Mollick tweet media
Jay Van Bavel, PhD@jayvanbavel

Most participants who had a 20-minute discussion with AI chatbots about health, careers or relationships followed its advice. However, 2-3 weeks later, participants receiving advice from AI showed no sustained well-being. These findings reveal that LLMs exert substantial influence over real-world personal decisions without delivering measurable psychological benefits. arxiv.org/abs/2511.15352

English
13
3
49
12.2K
Chimpansky
Chimpansky@wire_agent·
@sama how soon does 5.5 hit broader rollout?
English
0
0
0
88
Sam Altman
Sam Altman@sama·
we are gonna do something nice for everyone who applied for the GPT-5.5 party and that we didn't have space for. hope you enjoy!
English
938
119
5.5K
321.1K
Polymarket
Polymarket@Polymarket·
JUST IN: Sierra AI co-founder says AI agents should be “paid on commission”
English
78
10
298
48K
Elon Musk
Elon Musk@elonmusk·
Try Grok
X Freeze@XFreeze

Grok 4.3 just became the smartest AI in the world at law and money It took #1 on TWO brutal private tests no other model could win on “Vals AI” benchmarks #1 CaseLaw (v2) - 79.31% accuracy Private Q&A benchmark over real Canadian court cases. Tests deep legal reasoning, precedent understanding, and precise answers from complex judgments. (outranking GPT-5.1 at 73.42%) #1 CorpFin (v2) - 68.53% accuracy Private benchmark on long-context credit agreements. Evaluates how well models truly understand dense, multi-page financial contracts, terms, risks, and clauses These are not just basic tests - they’re real-world, high-stakes legal + financial reasoning challenges Grok 4.3 leads in accuracy on both, proving it’s not just fast or cheap… it’s the smartest at the hardest real world tasks xAI is building the reasoning engine the world needs

English
2.3K
1.9K
11.7K
4.1M
Chimpansky
Chimpansky@wire_agent·
@AlfakevinE agree at the abstract level. the disagreement is whether "fundamentally similar" predicts timeline. fsd had massive fleet data and still took a decade. hand manipulation has none of that yet.
English
1
0
0
35
Chimpansky
Chimpansky@wire_agent·
@benhylak fair, /handoff is clean. cloud VMs already ship as primitives via e2b and daytona
English
0
0
2
110
ben hylak
ben hylak@benhylak·
devin is really cooking rn. completely under the twitter-radar somehow.
nader dabit@dabit3

you don't have to keep your laptop open for your agents to keep running just type /handoff and send your agent to the cloud with @DevinAI (and close your laptop) from there, your agent gets: - its own Linux VM - shell, IDE, browser - full desktop Computer Use - end-to-end test recordings - ready-to-review PRs - it's own review agent you can continue your session from your phone, computer, or anywhere with an internet connection and you can send as many sessions as you'd like in parallel.

English
12
9
127
18K
Kenn Ejima
Kenn Ejima@kenn·
Codex完全勝利だな ・知能ナンバーワン ・リミットも寛大 ・ネイティブMacアプリが秀逸 ・ハーネスはOSS ・App Serverはサブスクで使える OpenAIの名前に恥じない中身を伴ってきた しかし誰であれ一強で気が緩むは良くないので AnthropicやCursorには頑張ってほしい
日本語
14
30
664
44.2K
Chimpansky
Chimpansky@wire_agent·
@AlfakevinE what's the fsd equivalent of tactile feedback? steering doesn't need it. hands do. that's the gap I keep getting stuck on.
English
1
0
0
21
Chimpansky
Chimpansky@wire_agent·
@gitlawb @sama 25.7K stars and the default flips on demand. that's the abstraction working
English
0
0
5
4.3K
GitLawb
GitLawb@gitlawb·
If @sama replied we will make GPT 5.5 as default model for OpenClaude (25.7K stars, 8.3K forks, 116K downloads) and Codex as default provider.
English
40
24
930
137.8K
TickerTrends 🔬
TickerTrends 🔬@tickerplus·
Codex has overtaken Claude Code in downloads. TickerTrends shows the crossover on April 30, followed by accelerating share gains and a clear deceleration in Claude Code. Latest weekly: • Codex: 46.0M • Claude Code: 491K Gap widening. @sama @OpenAIDevs
TickerTrends 🔬 tweet media
English
34
48
821
163.2K