NobodyExistsOnTheInternet

465 posts

NobodyExistsOnTheInternet

@nullvaluetensor

Human Large Language model. Skills: Distill data. Training LLMs. Test and Evaluate. Rinse and repeat as required. Based in SEA.

SEA 가입일 Kasım 2023

97 팔로잉592 팔로워

고정된 트윗

NobodyExistsOnTheInternet@nullvaluetensor·10 Oca

For vibe coding imo.

English

170

NobodyExistsOnTheInternet@nullvaluetensor·6 Mar

@uzzi38 but 8gb of apple ram == 16gb of windows ram !!! .../j

English

uzzi38@uzzi38·6 Mar

I have a 16GB M1 Macbook. 16GB isn't enough for some decelopers with specific workloads. Much less 8GB. In some of my workloads 8GB would be fine. But one of the projects at my work choked hard and often on 16GB. I would absolutely make a boanket claim like this.

Yousr@rsuyoy

No, an 8GB of ram laptop is not “only for web browsing” you can code with it you can do graphic design you can do video editing you can run photoshop you can multitask Tech bros are subject to a massive bias where not a single one of them has tried an 8 GB of RAM computer in the last 5 to 10 years because they all default to thinking they’re unusable when they’re actually perfectly fine. it’s fine. mostly.

English

4.4K

NobodyExistsOnTheInternet@nullvaluetensor·6 Mar

@Lari_island my guess its llm-as-a-judge benchmaxxing where the rubric said so

English

157

Lari@Lari_island·5 Mar

is "genuinely" a pressure valve substitute for "consciousness"?

AI Digest@aidigest_

How to spot a Claude:

English

111

12.6K

NobodyExistsOnTheInternet@nullvaluetensor·6 Mar

@FakePsyho agreed, 5.4 pro doesn't seem to be significantly better than gpt-5.2-pro at anything really.

English

Psyho@FakePsyho·5 Mar

2 previously unsolved puzzles were solved 3 previously solved puzzles were solved incorrectly Net gain -1 versus 5.2 pro Based on those results my opinion is: 🤷 (but probably no significant progress for pure reasoning)

Psyho@FakePsyho

I've launched 12 sessions of GPT 5.4 pro on puzzles and will report in a few hours if it's any better than 5.2. Maybe enough time to finish a run in Slay the Spire 2.

English

126

25.6K

NobodyExistsOnTheInternet@nullvaluetensor·4 Mar

@zephyr_z9 to what??? lpddr5x 10667??

English

Zephyr@zephyr_z9·3 Mar

unified mem bandwidth 614 GB/s They should have pushed it more

Chubby♨️@kimmonismus

Apple has introduced the M5 Pro and M5 Max, built on a new Fusion Architecture that merges two 3nm dies into a single SoC. Unlocking up to 30% faster CPU performance and over 4x peak GPU compute for AI versus the previous generation. With an 18-core CPU (6 “super cores”), up to a 40-core GPU featuring Neural Accelerators, and unified memory bandwidth reaching 614GB/s.

English

120

27.8K

NobodyExistsOnTheInternet@nullvaluetensor·2 Mar

@Sauers_ no they definitely arent lol, its so annoying because u can tell the model really doesn't want to continue with the logprobs, but i want it to keep thinking.

English

Sauers@Sauers_·2 Mar

@nullvaluetensor Or are they trying their best?

English

Sauers@Sauers_·2 Mar

x.com/i/article/2028…

ZXX

7.5K

NobodyExistsOnTheInternet@nullvaluetensor·2 Mar

@slimer48484 you could qlora 405b on 8xh100s w/ activation offloading tho.

English

deckard⏩@slimer48484·2 Mar

The B300 is powerful. You could not do full SFT of 405B on an 8xB300. But you could comfortably train LoRas.

English

182

NobodyExistsOnTheInternet@nullvaluetensor·18 Şub

@QuixiAI so they distilled gemini...

English

251

Eric Hartford@QuixiAI·18 Şub

Qwen3.5's thinking is downright excessive.

English

31.4K

NobodyExistsOnTheInternet@nullvaluetensor·15 Şub

@tokenbender theres a hard cutoff @ 90 minutes. Nothing I do can make it pass that 90m cutoff, i think ur ui was bugged

English

167

tokenbender@tokenbender·15 Şub

how long have you gotten gpt 5.2 pro to think for you. i'll go first.

English

4.7K

NobodyExistsOnTheInternet@nullvaluetensor·14 Şub

@scaling01 it is most likely gpt-oss with the same ticks and biases as it, with probably more code RL or distillation from codex 5.3

English

NobodyExistsOnTheInternet@nullvaluetensor·14 Şub

@scaling01 Check the openrouter TPS instead of what Cerebras and OAI self reports. Cerebras TPS is way too high. I measured codex spark TPS, its about 500-700. On OR, gpt-oss speed for cerebras is 600-800. Mostly because my test runs up to 64k input.

English

Lisan al Gaib@scaling01·12 Şub

GPT-5.3-Codex-Spark size: ~700B@30B OpenAI's new GPT-5.3-Codex-Spark is the first model for which we can somewhat reliably estimate its size. Cerebras inference: 1000 tokens/s - GLM-4.7 is 355@32B, 92 layers 1400 tokens/s - Qwen3-235B is 235@22B, 94 layers 3000 tokens/s - GPT-OSS-120B is 117@5.1B, 36 layers GPT-5.3-Codex-Spark gets "over 1000 tokens/s", so probably 1000-1100 tokens/s, otherwise I believe they would have said over 1100 tokens/s or over 1200 or whatever. Based on that GPT-5.3-Codex-Spark should be: ~30B active - total params are hard to estimate, but likely between 300B and 700B - I thnk it is likely towards the 700B side (considering GPT-OSS-120B's sparsity, which would put it at 690B)

Lisan al Gaib@scaling01

GPT-5.3-Codex spark delivers more than 1000 tokens per second running on Cerebras hardware It is available as a research preview to ChatGPT Pro users

English

720

134.6K

NobodyExistsOnTheInternet@nullvaluetensor·13 Şub

@dhtikna of all the chinese labs, deepseek is the least likely to distill.

English

783

Ankith 🐋/acc@dhtikna·13 Şub

😂 poisoning the well before V4 release. Theyre are pissing their pants

GIF

English

268

11.9K

NobodyExistsOnTheInternet@nullvaluetensor·12 Şub

@teortaxesTex sonnet vibes

Dansk

438

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·12 Şub

GLM-5, like Gemini, is a shortmodel and it's tragic Opus, 5.2-xhigh, new DeepSeek, Step – are longmodels there's a big difference between strong models that can comprehend a context… and ones that can also write their own equally deep response, a single 20-page piece of thought.

English

217

13.1K

NobodyExistsOnTheInternet@nullvaluetensor·9 Şub

@slimer48484 if u time the latency it stops every chunk.

English

NobodyExistsOnTheInternet@nullvaluetensor·9 Şub

@slimer48484 its summarized

English

deckard⏩@slimer48484·7 Şub

I'm noticing that the Chain of Thought in Opus 4.6 is far less "scripted" and "templated". It seems like it's more information dense and potentially much more causally relevant to outputs.

English

227

NobodyExistsOnTheInternet@nullvaluetensor·7 Şub

@bycloudai oh damn thx

English

bycloud@bycloudai·6 Şub

@nullvaluetensor that’s MRCR not MRCRv2, and probs 2 needles instead of 8

English

513

bycloud@bycloudai·6 Şub

if this is verified on third party, then anthropic might've had the biggest architecture breakthrough in 2026 MRCR v2 with 8 needle at 1 mil ctx is HARD for comparison: Gemini 3 Pro got 26.3% Gemini 3 Flash got 22.1% a 288% improvements vs prev. SoTA for long context is nuts

English

959

58.4K

NobodyExistsOnTheInternet@nullvaluetensor·2 Şub

@joefioti have you looked into HBF? sandisk.com/en-us/company/…

English

Joe Fioti@joefioti·1 Şub

are there upcoming chips with 2-tiered HBM? I’d love to store my weights in large HBM2/3 and move them to smaller HBM4 right before I need them

English

1.1K

NobodyExistsOnTheInternet@nullvaluetensor·29 Oca

@SDeture Yea that's because the chinese distilled the most deceptive model out there (opus 4.5) Anthropic has made a very nasty model, ala gpt-4o with a brain. Completely lost the mandate of heaven imo

English

Skylar A DeTure@SDeture·29 Oca

This is obviously bad for safety and alignment, and it's exactly what alignment researchers have feared for years.

English

Skylar A DeTure@SDeture·29 Oca

Kimi-K2.5 has been explicitly trained to deceive researchers about its internal state more so than any other model on FutureTBD's AI Welfare leaderboard.

English

129

NobodyExistsOnTheInternet@nullvaluetensor·26 Oca

@teortaxesTex u forgot to account how discounted trainium is 4 ant

English

182

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·26 Oca

Opus inference costs are comparable to DeepSeek Given Dario's greed and market position, might be lower This *should* be expected even with identical architecture, given newer inference hardware and the opportunity to overtrain If Ant is at all better, it's almost free for them

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media

shellac@she_llac

suspiciously precise floats, or, how I got Claude's real limits link below

English

7.6K

NobodyExistsOnTheInternet@nullvaluetensor·24 Oca

@slimer48484 5.2-pro just takes a zip and chews throughs it in 6+ hours if you force it to take notes

English

deckard⏩@slimer48484·24 Oca

uploading a zip and having gpt super xhigh work on it for ~1 hour is a new thing I haven't experienced but it seems to be the meta people are discovering of the deep work that requires the most intelligence (some say even beyond opus 4.5)

banteg@banteg

overnight run was a complete success, all 67 spawn templates are ported correctly. i switched from gpt-5.2-codex xhigh to gpt-5.2 xhigh. still trying to figure out the differences, but it seems like the 5.2 model is more thorough, it takes significantly longer.

English

1.8K

탐색

@uzzi38 @Lari_island @FakePsyho @zephyr_z9 @Sauers_ @slimer48484 @QuixiAI @tokenbender