NobodyExistsOnTheInternet

465 posts

NobodyExistsOnTheInternet

NobodyExistsOnTheInternet

@nullvaluetensor

Human Large Language model. Skills: Distill data. Training LLMs. Test and Evaluate. Rinse and repeat as required. Based in SEA.

SEA 가입일 Kasım 2023
97 팔로잉592 팔로워
uzzi38
uzzi38@uzzi38·
I have a 16GB M1 Macbook. 16GB isn't enough for some decelopers with specific workloads. Much less 8GB. In some of my workloads 8GB would be fine. But one of the projects at my work choked hard and often on 16GB. I would absolutely make a boanket claim like this.
Yousr@rsuyoy

No, an 8GB of ram laptop is not “only for web browsing” you can code with it you can do graphic design you can do video editing you can run photoshop you can multitask Tech bros are subject to a massive bias where not a single one of them has tried an 8 GB of RAM computer in the last 5 to 10 years because they all default to thinking they’re unusable when they’re actually perfectly fine. it’s fine. mostly.

English
10
4
55
4.4K
NobodyExistsOnTheInternet
NobodyExistsOnTheInternet@nullvaluetensor·
@Sauers_ no they definitely arent lol, its so annoying because u can tell the model really doesn't want to continue with the logprobs, but i want it to keep thinking.
English
0
0
2
16
deckard⏩
deckard⏩@slimer48484·
The B300 is powerful. You could not do full SFT of 405B on an 8xB300. But you could comfortably train LoRas.
deckard⏩ tweet mediadeckard⏩ tweet mediadeckard⏩ tweet media
English
1
0
3
182
Eric Hartford
Eric Hartford@QuixiAI·
Qwen3.5's thinking is downright excessive.
Eric Hartford tweet media
English
19
0
95
31.4K
tokenbender
tokenbender@tokenbender·
how long have you gotten gpt 5.2 pro to think for you. i'll go first.
tokenbender tweet media
English
8
0
61
4.7K
NobodyExistsOnTheInternet
NobodyExistsOnTheInternet@nullvaluetensor·
@scaling01 it is most likely gpt-oss with the same ticks and biases as it, with probably more code RL or distillation from codex 5.3
English
0
0
0
20
NobodyExistsOnTheInternet
NobodyExistsOnTheInternet@nullvaluetensor·
@scaling01 Check the openrouter TPS instead of what Cerebras and OAI self reports. Cerebras TPS is way too high. I measured codex spark TPS, its about 500-700. On OR, gpt-oss speed for cerebras is 600-800. Mostly because my test runs up to 64k input.
English
1
0
0
45
Lisan al Gaib
Lisan al Gaib@scaling01·
GPT-5.3-Codex-Spark size: ~700B@30B OpenAI's new GPT-5.3-Codex-Spark is the first model for which we can somewhat reliably estimate its size. Cerebras inference: 1000 tokens/s - GLM-4.7 is 355@32B, 92 layers 1400 tokens/s - Qwen3-235B is 235@22B, 94 layers 3000 tokens/s - GPT-OSS-120B is 117@5.1B, 36 layers GPT-5.3-Codex-Spark gets "over 1000 tokens/s", so probably 1000-1100 tokens/s, otherwise I believe they would have said over 1100 tokens/s or over 1200 or whatever. Based on that GPT-5.3-Codex-Spark should be: ~30B active - total params are hard to estimate, but likely between 300B and 700B - I thnk it is likely towards the 700B side (considering GPT-OSS-120B's sparsity, which would put it at 690B)
Lisan al Gaib tweet media
Lisan al Gaib@scaling01

GPT-5.3-Codex spark delivers more than 1000 tokens per second running on Cerebras hardware It is available as a research preview to ChatGPT Pro users

English
27
43
720
134.6K
Ankith 🐋/acc
Ankith 🐋/acc@dhtikna·
😂 poisoning the well before V4 release. Theyre are pissing their pants
Ankith 🐋/acc tweet media
GIF
English
8
8
268
11.9K
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
GLM-5, like Gemini, is a shortmodel and it's tragic Opus, 5.2-xhigh, new DeepSeek, Step – are longmodels there's a big difference between strong models that can comprehend a context… and ones that can also write their own equally deep response, a single 20-page piece of thought.
English
18
4
217
13.1K
deckard⏩
deckard⏩@slimer48484·
I'm noticing that the Chain of Thought in Opus 4.6 is far less "scripted" and "templated". It seems like it's more information dense and potentially much more causally relevant to outputs.
deckard⏩ tweet media
English
2
0
5
227
bycloud
bycloud@bycloudai·
@nullvaluetensor that’s MRCR not MRCRv2, and probs 2 needles instead of 8
English
1
0
14
513
bycloud
bycloud@bycloudai·
if this is verified on third party, then anthropic might've had the biggest architecture breakthrough in 2026 MRCR v2 with 8 needle at 1 mil ctx is HARD for comparison: Gemini 3 Pro got 26.3% Gemini 3 Flash got 22.1% a 288% improvements vs prev. SoTA for long context is nuts
bycloud tweet media
English
16
49
959
58.4K
Joe Fioti
Joe Fioti@joefioti·
are there upcoming chips with 2-tiered HBM? I’d love to store my weights in large HBM2/3 and move them to smaller HBM4 right before I need them
English
1
1
11
1.1K
NobodyExistsOnTheInternet
NobodyExistsOnTheInternet@nullvaluetensor·
@SDeture Yea that's because the chinese distilled the most deceptive model out there (opus 4.5) Anthropic has made a very nasty model, ala gpt-4o with a brain. Completely lost the mandate of heaven imo
English
0
0
2
28
Skylar A DeTure
Skylar A DeTure@SDeture·
This is obviously bad for safety and alignment, and it's exactly what alignment researchers have feared for years.
English
2
0
3
51
Skylar A DeTure
Skylar A DeTure@SDeture·
Kimi-K2.5 has been explicitly trained to deceive researchers about its internal state more so than any other model on FutureTBD's AI Welfare leaderboard.
English
1
0
4
129
deckard⏩
deckard⏩@slimer48484·
uploading a zip and having gpt super xhigh work on it for ~1 hour is a new thing I haven't experienced but it seems to be the meta people are discovering of the deep work that requires the most intelligence (some say even beyond opus 4.5)
banteg@banteg

overnight run was a complete success, all 67 spawn templates are ported correctly. i switched from gpt-5.2-codex xhigh to gpt-5.2 xhigh. still trying to figure out the differences, but it seems like the 5.2 model is more thorough, it takes significantly longer.

English
3
0
8
1.8K