Sheggle

1.1K posts

Sheggle

Sheggle

@sheggle_

Serve what you train - Not so patiently waiting for the next big model release

加入时间 Kasım 2019
174 关注37 粉丝
Sheggle
Sheggle@sheggle_·
@scaling01 @Ellefe_ if overtraining on cyber is allowed I'd bet the labs have the capability to train such a model in 2027, possibly end of 2026. Why the long timelines?
English
0
0
0
13
Lisan al Gaib
Lisan al Gaib@scaling01·
I'm sure you can squeeze small models a lot more, but there's a depth and knowledge gap. My guess is that a 120B can find the same exploits mythos did, but only if it has a lot more test time compute + heavily overtrained on cyber and it has to be distilled from an even stronger model than mythos. my timeline for such a 120B model is 2028-2030
English
2
0
0
418
Sheggle
Sheggle@sheggle_·
So happy we got a smart as fuck pope as we develop AGI. Sort of balances out the dumb as hell president.
English
0
0
0
3
Sheggle
Sheggle@sheggle_·
@tenobrus @deanwball There's a huge transformation coming, and it can only go over well if people are aligned on 1. what's coming and when, and 2. a rough plan on how to deal with it. It's crucial to take the Vatican along in this. I'd say it's more unethical to leave the Vatican out for some moral
English
1
0
1
14
Dean W. Ball
Dean W. Ball@deanwball·
This, from Olah, plainly contradicts the encyclical, which confidently asserts that AI does not have, and never will have, “real” thoughts or feelings. It’s disappointing to see Anthropic align itself with a document that violates their own moral and intellectual principles.
Dean W. Ball tweet mediaDean W. Ball tweet media
English
83
50
560
174.8K
Sheggle
Sheggle@sheggle_·
@Sevenontheriver @disclosetv What is the dignity in working a 60h workweek knowing full well an AI could do it in under an hour? Why would we punish people for making their life easier? Why disallow africa from making a jump forward in their standard of living?
English
2
0
0
6
David
David@Sevenontheriver·
@disclosetv Or idk crazy fucking concept how about we don't use it to displace labor and focus building something that does the cool shit like goes to space, curse shitty diseases..but we have to ipo and to do that gotta get rid of the accountants. Fucking hate these pos
English
1
0
0
27
Disclose.tv
Disclose.tv@disclosetv·
NOW - Anthropic co-founder says there is a "real possibility that AI will displace human labor at a very large scale," and that supporting those people "will be a moral imperative of historic proportions."
English
470
812
4.2K
710.2K
Miles Brundage
Miles Brundage@Miles_Brundage·
Many of you are vastly overconfident in Pangram
English
29
21
444
36.3K
Jøhnathan
Jøhnathan@Heavenly_Race_·
Once you hit about a 20-point IQ gap, communication starts to completely break down. It's not that the lower IQ person is "stupid" (although that can often be the case) or the higher one is arrogant, it's that you're literally operating on different systems. A 20 point difference (roughly 1.3 standard deviations) means: Vocabulary and abstraction levels diverge sharply. What feels like crystal clear logic to one side sounds like vague, pretentious word salad to the other. Jokes land flat. Metaphors get taken literally. Complex cause and effect chains get simplified into "this good, that bad." Different time horizons and pattern recognition. One person thinks in months or years and sees systems, the other is locked into days or immediate rewards. Trying to explain second order effects feels like speaking another language. Also, processing speed and working memory gaps. The higher IQ person is already three steps ahead, getting impatient. The lower IQ person feels talked down to or overwhelmed. Both walk away frustrated. Both have wasted each others time.
English
1.7K
3.1K
24.1K
3M
Samuel Hume
Samuel Hume@DrSamuelBHume·
There's a nice insight in here from Demis on 'thinking' by AI models "I play chess against Gemini. Looking at the thinking traces, sometimes it will consider a move, realise it's a blunder, but can't find anything better so comes back to that move and does it anyway. There are huge gaps still, but it may only be one or two tweaks to fix them"
Y Combinator@ycombinator

Demis Hassabis (@demishassabis) has had one of the most extraordinary careers in tech. He started as a chess prodigy and video game designer at 17 before getting a PhD in neuroscience and going on to found DeepMind. His lab cracked Go, solved protein structure prediction with AlphaFold, and then gave it away free to every scientist on earth. That work won him the 2024 Nobel Prize in Chemistry. Today he leads @GoogleDeepMind, pushing toward the same goal he set as a teenager: AGI. On this special live episode of How to Build the Future, he sat down with YC's @garrytan to talk about what still needs to happen to get us to AGI, his advice for founders on how to stay ahead of the curve, and what the next big scientific breakthroughs might be. 01:48 — What’s Missing Before We Get To AGI? 03:36 — Why Memory Is Still Unsolved 06:14 — How AlphaGo Shaped Gemini 08:06 — Why Smaller Models Are Getting So Powerful 10:46 — The 1000x Engineer 12:40 — Continual Learning and the Future of Agents 13:32 — Why AI Still Fails at Basic Reasoning 15:33 — Are Agents Overhyped or Just Getting Started? 18:31 — Can AI Become Truly Creative? 20:26 — Open Models, Gemma, and Local AI 22:26 — Why Gemini Was Built Multimodal 24:08 — What Happens When Inference Gets Cheap? 25:24 — From AlphaFold to the Virtual Cells 28:24 — AI as the Ultimate Tool for Science 30:43 — Advice for Founders 33:30 — The AlphaFold Breakthrough Pattern 35:20 — Can AI Make Real Scientific Discoveries? 37:59 — What to Build Before AGI Arrives

English
1
4
44
15.9K
Sheggle
Sheggle@sheggle_·
@ar0cket1 Then how does the token billing work?
English
1
0
0
855
ar0cket1
ar0cket1@ar0cket1·
Why do people still think that GPT 5.5 Pro is its own model when it’s quite evidentially 5.5 xhigh in an ensemble or some other test time compute scaled variant
English
19
0
177
20.5K
Sheggle
Sheggle@sheggle_·
@seconds_0 oai puts, clearly not capacity constrained, low demand
English
0
0
0
342
0.005 Seconds (3/694)
0.005 Seconds (3/694)@seconds_0·
So, OpenAI batch API is 50% off and says it can be up to 24 hours to respond Want to know my practical average over 2800 (large) calls? 5.8 minutes
English
34
11
1K
101K
Sheggle
Sheggle@sheggle_·
@ziv_ravid I genuinely buy it. So much info has come out about Dario constantly fighting for safety, Carlini has had major freakouts, I do fully believe they are worried, and that that's the reason they didn't release it
English
0
0
0
319
Ravid Shwartz Ziv
Ravid Shwartz Ziv@ziv_ravid·
Anthropic isn't releasing Mythos. The Official reason is that it's too dangerous and could be used to exploit zero-days at scale. Honest poll: how many of you think that if Anthropic had the compute to serve Mythos to everyone, they would still be holding it back? Quite the coincidence that safety narratives and compute constraints have started to rhyme so perfectly, no?
Anthropic@AnthropicAI

Last month we launched Project Glasswing, our collaborative AI cybersecurity initiative. Since then, we and our partners have found more than ten thousand high- or critical-severity vulnerabilities in essential software.

English
22
4
106
71.5K
Sheggle
Sheggle@sheggle_·
@jbcourt @ArtificialAnlys while reasoning less, that's a win. Cuz it's cheaper and faster. 'token efficiency' means reasoning for fewer tokens, and hence, lower costs.
English
1
0
0
19
Sheggle
Sheggle@sheggle_·
@jbcourt @ArtificialAnlys 1.2 tokens is roughly equal to a word (VERY roughly, can reach from 0.5 tokens to 4 tokens per word). It's what the model sees (where you or I would see text). Tokens is what you pay for. Models nowadays reason before they answer. If you can get the same quality answer...
English
1
1
0
72
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
Cursor Composer 2.5's is 3–18x cheaper than Opus 4.7 in Claude Code (medium reasoning), and 5–32x cheaper than GPT-5.5 in Codex (medium) based on API pricing This low Cost per Task isn't just driven by relatively low token pricing, it's also driven by low relatively low token usage compared to other leading models. @cursor_ai Composer 2.5 only used 1.6M token to complete our Coding Agent Index benchmarks, while other models used up to 5.7M. This lower token usage also contributes to a low Time per Task. Across the Coding Agent Index configurations shown, average Time per Task was ~12 minutes. Composer 2.5 completed tasks in ~9 minutes on average, making it ~1.3x faster than average, while Composer 2.5 Fast completed tasks in ~7 minutes, making it ~1.8x faster than the average across agents. Link to full benchmark results below
Artificial Analysis tweet media
English
149
633
2K
517.6K
Sheggle
Sheggle@sheggle_·
@tszzl @_simonsmith Ye but if you use RSI to roll out your own chips, I think that should also count as that's an important part of the strategy. Maybe efficiency of revenue more generally, something like a margin of sorts.
English
0
0
1
33
roon
roon@tszzl·
@_simonsmith seems bad because tokens from a tiny model mean very little vs from Mythos 4. earnings per kWh is the relevant metric
English
25
7
378
25.1K
Simon Smith
Simon Smith@_simonsmith·
When the big AI labs are all public, we should establish metrics to compare them more easily. One idea: Earnings Per Token. For example, Google outputs lots of tokens, but likely at a lower EPT than Anthropic because of their different business models. Higher volume, lower EPT.
English
3
0
53
12.7K
Lisan al Gaib
Lisan al Gaib@scaling01·
what do you think about this idea to chart current AI capabilities and factor in the acceleration/velocity of labs? don't read too much into the actual numbers, right now it's more a vibe of what model capabilities are right now and could be in 12 months today: Anthropic > OpenAI >> Google >> Meta > xAI >= DeepSeek, Moonshot, Alibaba, Zhipu, ByteDance > MiniMax In 12 months I think it's pretty much the same except that all the labs that aren't on the frontier will fall behind by a couple of months depending on how much compute they have I would also really like to add error bars to that, because for some labs the outcome distribution is just much wider.
Lisan al Gaib tweet media
English
12
2
64
19.9K
Sebastian Griego
Sebastian Griego@sebngriego·
@littmath @Jabaluck I've spoken to people who work on LLM conjecturing for math, and my understanding is that LLMs are pretty bad at creating "interesting" conjectures. At least compared to their ability to solve problems.
English
1
0
7
353
Sheggle
Sheggle@sheggle_·
@_sholtodouglas Lack of thoroughness on when something is 'done' + lack of thoroughness on understanding why sth is a certain way in the codebase, and then preserving that. It happily overwrites crucial logic.
English
0
0
0
15
Sholto Douglas
Sholto Douglas@_sholtodouglas·
When do you reach for other models instead of Claude? What can we do better? Hit me with all of your frustrations. dms open. If you can give me detail (e.g. specifics/transcipts) - it'll help a lot in finding out exactly what we need to do to improve the next model
English
1.2K
84
1.4K
392.3K
Sheggle
Sheggle@sheggle_·
@karkwonk @shakoistsLog By renting out the only part of their business with theoretical exponential potential...
English
1
0
0
9
Riley Yung
Riley Yung@karkwonk·
@shakoistsLog Also, anyone just paying basic attention would realize they are already at ~double the listed revenue run-rate due to the anthropic deal, which isn't included at all in Q1 numbers
English
1
0
3
574
shako
shako@shakoistsLog·
multiples aka linear growth projections don’t work well on transformative tech that deals in exponentials and binary events. stop trying to see the world like they did 40 years ago. old valuation models are still useful tools, but this isn’t buffets world anymore of paper mills and insurance companies
Boring_Business@BoringBiz_

SpaceX IPO valuation implies a 93x revenue multiple and you don’t even have a P/E ratio because the company has negative earnings

English
12
4
239
19.3K