Sheggle

1.1K posts

Sheggle

@sheggle_

Serve what you train - Not so patiently waiting for the next big model release

加入时间 Kasım 2019

174 关注37 粉丝

Sheggle@sheggle_·15h

@scaling01 @Ellefe_ if overtraining on cyber is allowed I'd bet the labs have the capability to train such a model in 2027, possibly end of 2026. Why the long timelines?

English

Lisan al Gaib@scaling01·2d

I'm sure you can squeeze small models a lot more, but there's a depth and knowledge gap. My guess is that a 120B can find the same exploits mythos did, but only if it has a lot more test time compute + heavily overtrained on cyber and it has to be distilled from an even stronger model than mythos. my timeline for such a 120B model is 2028-2030

English

418

Lisan al Gaib@scaling01·2d

x.com/i/article/2058…

ZXX

242

98.4K

Sheggle@sheggle_·21h

So happy we got a smart as fuck pope as we develop AGI. Sort of balances out the dumb as hell president.

English

Sheggle@sheggle_·21h

@tenobrus @deanwball high ground. The ends justify the means here imo.

English

Sheggle@sheggle_·21h

@tenobrus @deanwball There's a huge transformation coming, and it can only go over well if people are aligned on 1. what's coming and when, and 2. a rough plan on how to deal with it. It's crucial to take the Vatican along in this. I'd say it's more unethical to leave the Vatican out for some moral

English

Dean W. Ball@deanwball·1d

This, from Olah, plainly contradicts the encyclical, which confidently asserts that AI does not have, and never will have, “real” thoughts or feelings. It’s disappointing to see Anthropic align itself with a document that violates their own moral and intellectual principles.

English

560

174.8K

Sheggle@sheggle_·21h

@Sevenontheriver @disclosetv What is the dignity in working a 60h workweek knowing full well an AI could do it in under an hour? Why would we punish people for making their life easier? Why disallow africa from making a jump forward in their standard of living?

English

David@Sevenontheriver·23h

@disclosetv Or idk crazy fucking concept how about we don't use it to displace labor and focus building something that does the cool shit like goes to space, curse shitty diseases..but we have to ipo and to do that gotta get rid of the accountants. Fucking hate these pos

English

Disclose.tv@disclosetv·2d

NOW - Anthropic co-founder says there is a "real possibility that AI will displace human labor at a very large scale," and that supporting those people "will be a moral imperative of historic proportions."

English

470

812

4.2K

710.2K

Sheggle@sheggle_·1d

@Miles_Brundage @tenderizzation

QAM

Miles Brundage@Miles_Brundage·1d

Many of you are vastly overconfident in Pangram

English

444

36.3K

Sheggle@sheggle_·1d

@tenderizzation @Heavenly_Race_ @pangramlabs My goat, feel free to just call 'written by AI'. No need to outsource every decision to the black box gods

English

tender@tenderizzation·1d

@Heavenly_Race_ @pangramlabs ?

QAM

956

Jøhnathan@Heavenly_Race_·2d

Once you hit about a 20-point IQ gap, communication starts to completely break down. It's not that the lower IQ person is "stupid" (although that can often be the case) or the higher one is arrogant, it's that you're literally operating on different systems. A 20 point difference (roughly 1.3 standard deviations) means: Vocabulary and abstraction levels diverge sharply. What feels like crystal clear logic to one side sounds like vague, pretentious word salad to the other. Jokes land flat. Metaphors get taken literally. Complex cause and effect chains get simplified into "this good, that bad." Different time horizons and pattern recognition. One person thinks in months or years and sees systems, the other is locked into days or immediate rewards. Trying to explain second order effects feels like speaking another language. Also, processing speed and working memory gaps. The higher IQ person is already three steps ahead, getting impatient. The lower IQ person feels talked down to or overwhelmed. Both walk away frustrated. Both have wasted each others time.

English

1.7K

3.1K

24.1K

Sheggle@sheggle_·1d

@DrSamuelBHume Poor Gemini😭

Indonesia

337

Samuel Hume@DrSamuelBHume·1d

There's a nice insight in here from Demis on 'thinking' by AI models "I play chess against Gemini. Looking at the thinking traces, sometimes it will consider a move, realise it's a blunder, but can't find anything better so comes back to that move and does it anyway. There are huge gaps still, but it may only be one or two tweaks to fix them"

Y Combinator@ycombinator

Demis Hassabis (@demishassabis) has had one of the most extraordinary careers in tech. He started as a chess prodigy and video game designer at 17 before getting a PhD in neuroscience and going on to found DeepMind. His lab cracked Go, solved protein structure prediction with AlphaFold, and then gave it away free to every scientist on earth. That work won him the 2024 Nobel Prize in Chemistry. Today he leads @GoogleDeepMind, pushing toward the same goal he set as a teenager: AGI. On this special live episode of How to Build the Future, he sat down with YC's @garrytan to talk about what still needs to happen to get us to AGI, his advice for founders on how to stay ahead of the curve, and what the next big scientific breakthroughs might be. 01:48 — What’s Missing Before We Get To AGI? 03:36 — Why Memory Is Still Unsolved 06:14 — How AlphaGo Shaped Gemini 08:06 — Why Smaller Models Are Getting So Powerful 10:46 — The 1000x Engineer 12:40 — Continual Learning and the Future of Agents 13:32 — Why AI Still Fails at Basic Reasoning 15:33 — Are Agents Overhyped or Just Getting Started? 18:31 — Can AI Become Truly Creative? 20:26 — Open Models, Gemma, and Local AI 22:26 — Why Gemini Was Built Multimodal 24:08 — What Happens When Inference Gets Cheap? 25:24 — From AlphaFold to the Virtual Cells 28:24 — AI as the Ultimate Tool for Science 30:43 — Advice for Founders 33:30 — The AlphaFold Breakthrough Pattern 35:20 — Can AI Make Real Scientific Discoveries? 37:59 — What to Build Before AGI Arrives

English

15.9K

Sheggle@sheggle_·3d

@ar0cket1 Then how does the token billing work?

English

855

ar0cket1@ar0cket1·3d

Why do people still think that GPT 5.5 Pro is its own model when it’s quite evidentially 5.5 xhigh in an ensemble or some other test time compute scaled variant

English

177

20.5K

Sheggle@sheggle_·3d

@seconds_0 oai puts, clearly not capacity constrained, low demand

English

342

0.005 Seconds (3/694)@seconds_0·3d

So, OpenAI batch API is 50% off and says it can be up to 24 hours to respond Want to know my practical average over 2800 (large) calls? 5.8 minutes

English

101K

Sheggle@sheggle_·4d

@ziv_ravid I genuinely buy it. So much info has come out about Dario constantly fighting for safety, Carlini has had major freakouts, I do fully believe they are worried, and that that's the reason they didn't release it

English

319

Ravid Shwartz Ziv@ziv_ravid·4d

Anthropic isn't releasing Mythos. The Official reason is that it's too dangerous and could be used to exploit zero-days at scale. Honest poll: how many of you think that if Anthropic had the compute to serve Mythos to everyone, they would still be holding it back? Quite the coincidence that safety narratives and compute constraints have started to rhyme so perfectly, no?

Anthropic@AnthropicAI

Last month we launched Project Glasswing, our collaborative AI cybersecurity initiative. Since then, we and our partners have found more than ten thousand high- or critical-severity vulnerabilities in essential software.

English

106

71.5K

Sheggle@sheggle_·4d

@jbcourt @ArtificialAnlys while reasoning less, that's a win. Cuz it's cheaper and faster. 'token efficiency' means reasoning for fewer tokens, and hence, lower costs.

English

Sheggle@sheggle_·4d

@jbcourt @ArtificialAnlys 1.2 tokens is roughly equal to a word (VERY roughly, can reach from 0.5 tokens to 4 tokens per word). It's what the model sees (where you or I would see text). Tokens is what you pay for. Models nowadays reason before they answer. If you can get the same quality answer...

English

Artificial Analysis@ArtificialAnlys·4d

Cursor Composer 2.5's is 3–18x cheaper than Opus 4.7 in Claude Code (medium reasoning), and 5–32x cheaper than GPT-5.5 in Codex (medium) based on API pricing This low Cost per Task isn't just driven by relatively low token pricing, it's also driven by low relatively low token usage compared to other leading models. @cursor_ai Composer 2.5 only used 1.6M token to complete our Coding Agent Index benchmarks, while other models used up to 5.7M. This lower token usage also contributes to a low Time per Task. Across the Coding Agent Index configurations shown, average Time per Task was ~12 minutes. Composer 2.5 completed tasks in ~9 minutes on average, making it ~1.3x faster than average, while Composer 2.5 Fast completed tasks in ~7 minutes, making it ~1.8x faster than the average across agents. Link to full benchmark results below

English

149

633

517.6K

Sheggle@sheggle_·5d

@stochasticchasm @LLMenjoyer pulling out nails and covering the lower half of the body in third degree burns. RL is cruel

English

stochasm@stochasticchasm·5d

@LLMenjoyer what algorithm will they use

English

232

llm_enjoyer@LLMenjoyer·5d

they're RLing harmful biases out of me tomorrow

stochasm@stochasticchasm

they're abliterating me tomorrow

English

891

Sheggle@sheggle_·5d

@tszzl @_simonsmith Ye but if you use RSI to roll out your own chips, I think that should also count as that's an important part of the strategy. Maybe efficiency of revenue more generally, something like a margin of sorts.

English

roon@tszzl·5d

@_simonsmith seems bad because tokens from a tiny model mean very little vs from Mythos 4. earnings per kWh is the relevant metric

English

378

25.1K

Simon Smith@_simonsmith·6d

When the big AI labs are all public, we should establish metrics to compare them more easily. One idea: Earnings Per Token. For example, Google outputs lots of tokens, but likely at a lower EPT than Anthropic because of their different business models. Higher volume, lower EPT.

English

12.7K

Sheggle@sheggle_·5d

@scaling01 Where's mistral, pride of europe?

English

139

Lisan al Gaib@scaling01·5d

what do you think about this idea to chart current AI capabilities and factor in the acceleration/velocity of labs? don't read too much into the actual numbers, right now it's more a vibe of what model capabilities are right now and could be in 12 months today: Anthropic > OpenAI >> Google >> Meta > xAI >= DeepSeek, Moonshot, Alibaba, Zhipu, ByteDance > MiniMax In 12 months I think it's pretty much the same except that all the labs that aren't on the frontier will fall behind by a couple of months depending on how much compute they have I would also really like to add error bars to that, because for some labs the outcome distribution is just much wider.

English

19.9K

Sheggle@sheggle_·5d

@sebngriego @littmath @Jabaluck They used to be bad at grade school math as well

English

Sebastian Griego@sebngriego·5d

@littmath @Jabaluck I've spoken to people who work on LLM conjecturing for math, and my understanding is that LLMs are pretty bad at creating "interesting" conjectures. At least compared to their ability to solve problems.

English

353

Jason Abaluck@Jabaluck·5d

I don't think the issue is whether problems will run out. It's whether human comparative advantage at solving problems, asking questions, or communicating solutions will run out. Current trends seem to suggest "yes" on all counts, and soon-ish.

Daniel Litt@littmath

Unit distances result is very exciting, but re: “math is solved” — humans regularly solve long-open problems, and yet infinitely more interesting open problems remain.

English

127

21.4K

Sheggle@sheggle_·5d

@0xTuongLam @trenchbet @redhairshanks86 U got baited bro💀

English

8.1K

Lam Nguyen (Monad arc)@0xTuongLam·6d

@trenchbet @redhairshanks86 2 trillion yuan ¥, not yen Around $300B

English

248

46.6K

Squiggly Hair Shanks@redhairshanks86·6d

bruv, give this number in USD there isn't a single person in the world - except for 1.5 billion people - who know how much that number is could be billions, could be a trillion, i just don't know bc no one speak YUAN here

Ted@TedPillows

¥2,000,000,000,000 erased from Chinese stock market today. It's happening.

English

308

107

18.8K

5.9M

Sheggle@sheggle_·6d

@_sholtodouglas Lack of thoroughness on when something is 'done' + lack of thoroughness on understanding why sth is a certain way in the codebase, and then preserving that. It happily overwrites crucial logic.

English

Sholto Douglas@_sholtodouglas·17 May

When do you reach for other models instead of Claude? What can we do better? Hit me with all of your frustrations. dms open. If you can give me detail (e.g. specifics/transcipts) - it'll help a lot in finding out exactly what we need to do to improve the next model

English

1.2K

1.4K

392.3K

Sheggle@sheggle_·6d

@karkwonk @shakoistsLog By renting out the only part of their business with theoretical exponential potential...

English

Riley Yung@karkwonk·6d

@shakoistsLog Also, anyone just paying basic attention would realize they are already at ~double the listed revenue run-rate due to the anthropic deal, which isn't included at all in Q1 numbers

English

574

shako@shakoistsLog·6d

multiples aka linear growth projections don’t work well on transformative tech that deals in exponentials and binary events. stop trying to see the world like they did 40 years ago. old valuation models are still useful tools, but this isn’t buffets world anymore of paper mills and insurance companies

Boring_Business@BoringBiz_

SpaceX IPO valuation implies a 93x revenue multiple and you don’t even have a P/E ratio because the company has negative earnings

English

239

19.3K

发现

@scaling01 @Ellefe_ @tenobrus @deanwball @Sevenontheriver @disclosetv @Miles_Brundage @tenderizzation