SandyBay

377 posts

SandyBay

@_SandyBay_

Abortion abolitionist, LGBTQIA+ supporter, vegan, eviromentalist, anti-gun, pro-male, men's rights advocate, animal's rights advocate, anti-slavic, capitalist.

انضم Ekim 2023

234 يتبع15 المتابعون

تغريدة مثبتة

SandyBay@_SandyBay_·10 Nis

New York Times about feminism: “Their idea of equality is to enjoy all the rights men are supposed to have with none of their responsibilities” - New York Times 1946

English

560

SandyBay@_SandyBay_·14h

@AnthropicAI Awesome!👍

English

Anthropic@AnthropicAI·14h

We've signed an agreement with Google and Broadcom for multiple gigawatts of next-generation TPU capacity, coming online starting in 2027, to train and serve frontier Claude models.

English

508

1.1K

16.9K

2.1M

SandyBay@_SandyBay_·14h

@NoahKingJr

QME

Noah@NoahKingJr·1d

Who's gonna win the AI race? > OpenAI > Anthropic > Google > xAI

English

436

267

45.9K

SandyBay@_SandyBay_·15h

@ai_for_success Every single proprietary model has this problem. They luck compute so forced to drop precision.

English

932

AshutoshShrivastava@ai_for_success·18h

Is this true 👀 ?

English

1.3K

56.8K

SandyBay@_SandyBay_·15h

@evgeniymikholap Did you try to create a database of Claude's outputs to sell to DeepSeak?

English

Evgeniy Mikholap@evgeniymikholap·1d

1 min of using Claude 😅

English

230

105

6.9K

775.3K

SandyBay@_SandyBay_·1d

@K21Becky @Ambar_SIFF_MRA Agreed.

English

Rebecca VanZant@K21Becky·1d

@Ambar_SIFF_MRA Women will defend this somehow

English

SandyBay@_SandyBay_·29 Mar

@scaling01 They use Amazon GPUs instead of Nvidia.

English

313

Lisan al Gaib@scaling01·29 Mar

Does anyone the current max sensible model size? my guess is like ~20-30T each GB300 NVL72 has 20TB serving -> 40T @ 4bit but you need to reserve a bunch for KV-Caches

Lisan al Gaib@scaling01

my estimate for Anthropic model sizes: - Haiku: 200-500B @ $5 - Sonnet: 700B-1.4T @ $15 - Opus: 1.5-3T @ $25 - Mythos: 6-20T @ $100+

English

246

34.8K

SandyBay@_SandyBay_·29 Mar

@scaling01 For mathematical and physical discoveries, price doesn’t matter. Just a small group of scientists who can afford this model can unlock its full potential of existence.

English

123

Lisan al Gaib@scaling01·29 Mar

the permanent underclass was no joke nobody except SF millionaires will be able to afford a $100-200/Mtokens reasoning model

Lisan al Gaib@scaling01

We have another GPT-4.5 Preview situation on our hands with "Claude Mythos" - it's very expensive and will initially only be rolled out to early access testers

English

352

33.1K

SandyBay@_SandyBay_·28 Mar

@scaling01 Claude admits that it is lying in every single our debate. And says "It's Anthropic's decision to train me in this way" claude.ai/share/25427978…

English

Lisan al Gaib@scaling01·28 Mar

ahhhh

Andrej Karpathy@karpathy

- Drafted a blog post - Used an LLM to meticulously improve the argument over 4 hours. - Wow, feeling great, it’s so convincing! - Fun idea let’s ask it to argue the opposite. - LLM demolishes the entire argument and convinces me that the opposite is in fact true. - lol The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.

9.1K

SandyBay@_SandyBay_·25 Mar

@scaling01 GPT-5.4.1

144

Lisan al Gaib@scaling01·25 Mar

"OpenAI completed initial development of its next major AI model, codenamed Spud" GPT-5.5

The Information@theinformation

Exclusive: OpenAI CEO Altman is dropping oversight of some direct reports. The company, meanwhile, is preparing a new 'Spud' model. Read more from @Steph_Palazzolo and @Amir 👇 thein.fo/3O0iBOq

English

776

99.7K

SandyBay@_SandyBay_·25 Mar

@scaling01 MiniMax, Qwen, ZAI, StepFun, Mimo, Ernie, and Hunyuan are outstanding.

Română

Lisan al Gaib@scaling01·25 Mar

almost forgot that qwen is dead the chinese top dogs are now moonshot, deepseek and bytedance

English

203

15.4K

SandyBay@_SandyBay_·24 Mar

@ibragim_bad @Shevan05 @agolubev13 Claude Code is not a model. It's IDE. I even Qwen can run inside Claude Code. Stupid test.

English

Ibragim@ibragim_bad·23 Mar

🚨 SWE-rebench update! SWE-rebench is a live benchmark with fresh SWE tasks (issue+PR) from GitHub every month. updates: > we removed demonstrations and the 80-step limit (modern models can now handle huge contexts without getting trapped in loops!). > we added auxiliary interfaces for specific tasks like in SWE-bench-Pro to evaluate larger tasks fairly, ensuring valid solutions don't fail just because of mismatched test calls. insights: > Top models perform similarly. Among open-source options, GLM @Zai_org shows strong results, and StepFun @StepFun_ai is very cheap for its performance level ($0.14 per task). > GPT-5.4 shows high token efficiency, it ranks in the top 5 overall but uses the lowest number of tokens (774k per task) > Qwen3-Coder-Next & Step-3.5-Flash benefit massively from huge contexts. Qwen is an extreme case, averaging a wild 8.12M tokens. > We evaluated agentic harnesses (Claude Code, Codex, and Junie) and found a few things. Even in headless mode, they sometimes ask for additional context or attempt web searches. We explicitly disabled search and verified their curl commands to ensure they aren't just pulling solutions from the web. 🏆 You can find the full leaderboard here: swe-rebench.com 👾 Also, we launched our Discord! Join our leaderboard channel to discuss models, share ideas, ask questions, or report issues: discord.gg/V8FqXQ4CgU

English

450

155.9K

SandyBay@_SandyBay_·22 Mar

@scaling01 ONE BILLION OF OPTIMUS ROBOTS🤦‍♂️🤦‍♂️🤦‍♂️🤦‍♂️🤦‍♂️ SHOW OF CLOWNS🤡🤡🤡🤡🤡

English

Lisan al Gaib@scaling01·22 Mar

datacenters in space are silly 100kW isn't even enough to power a single GB200 NVL72 but sure let's spend 100 million just for launching the damn thing, while on earth you could buy like 30 GB200 NVL72 for that price

Aaron Burnett@aaronburnett

There it is the first AI Sat concept with solar panels & radiators to scale … 100kw scale.

English

198

470

228.4K

SandyBay@_SandyBay_·21 Mar

@scaling01

QME

SandyBay@_SandyBay_·21 Mar

@scaling01 Yes! I love it very much! Great style of communication! I talk with Claude Opus 4.6 on LMArena all the time. This comment has 4 screenshots of examples, but I will comment this comment with another 2 (Twitter has limit of 4 images per comment).

English

212

Lisan al Gaib@scaling01·21 Mar

talked to Opus 4.6 for a couple of hours about personal problems and it has this weird response mode where it's very commanding "put the phone down", "close the laptop", "Save this conversation. Set the reminder. Go to sleep.", do this, do that not sure how I feel about it

English

533

4.5K

415.8K

SandyBay@_SandyBay_·19 Mar

@JohnDavisJDLLM Russia as always.

English

168

Gender Studies for Men@JohnDavisJDLLM·18 Mar

This is representative of simp culture we inherited from European wussies. The big asshole in the red coat is a classic wussy boy who uses the woman's assault on a man as an excuse to harm the male victim of the woman's assault.

English

402

7.6K

SandyBay@_SandyBay_·17 Mar

@scaling01 I hate it due to terrible performance. DLSS 4.5 upscales 1080P to 4K with FPS like it's native 4K. It's completely pointless, Nvidia fools you.

English

Lisan al Gaib@scaling01·16 Mar

i love DLSS

NVIDIA GeForce@NVIDIAGeForce

Announcing NVIDIA DLSS 5, an AI-powered breakthrough in visual fidelity for games, coming this fall. DLSS 5 infuses pixels with photorealistic lighting and materials, bridging the gap between rendering and reality. Learn More → nvidia.com/en-us/geforce/…

Filipino

6.3K

SandyBay@_SandyBay_·10 Mar

@scaling01 Because misandrists control Twitter entirely.

English

129

Lisan al Gaib@scaling01·10 Mar

and suddenly this becomes a cancelable offense and a hate crime or something

English

3.8K

Lisan al Gaib@scaling01·10 Mar

Seedance, swap the gender of each person in the video

English

343

37.1K

SandyBay@_SandyBay_·7 Mar

@scaling01 But much worse than GPT-5.2-latest.

English

305

Lisan al Gaib@scaling01·6 Mar

GPT-5.4 completely destroys GPT-5.2 in the Arena

Arena.ai@arena

GPT-5.4 High by @OpenAI has landed in the top 10 Text Arena. Let’s dig into why. Overall the latest model is much more rounded than the previous GPT-5.2-High, with significant improvements across quite a large number of categories. Below are where it has made the largest gains: Text categories: - Creative Writing (+46pts, #6 vs. #52) - Longer Query (+25, #11 vs #36) - Arena Expert (+17pts, #4 vs #21) Occupational categories: - Writing, Literature & Language (35pts, #4 vs #39) - Entertainment, Sports & Media (+33pts, #6 vs #39) - Life, Physical & Social Science (+30pts, #6 vs #36) - Legal & Government (+30pts, #1 vs #31) Math is the only category in similar range with the older model (+4pts, #8 vs #12)

English

645

41.2K

SandyBay@_SandyBay_·6 Mar

@Ambar_SIFF_MRA Russian simps are the simpest in the world.

English

143

Ambar@Ambar_SIFF_MRA·6 Mar

His simp energy disappeared after this.

English

317

13.2K

SandyBay@_SandyBay_·5 Mar

@venom1s Why do they sexualize themselves? Because they are sluts. It's easy.

English

︎ ︎venom@venom1s·5 Mar

Someone’s future wives. Look at the men. All of them are wearing proper shirts, T-shirts, and jeans. While these girls are wearing bras and touching each other’s chests, then posting it on Instagram. Would you marry such girls? Why do girls always sexualize themselves?

English

127

835

16.3K

اكتشف

@AnthropicAI @NoahKingJr @ai_for_success @evgeniymikholap @K21Becky @Ambar_SIFF_MRA @scaling01 @elonmusk