Richard Taubo

316 posts

Richard Taubo

@WolfHumble

These are things that make me happy: [1] Helping people in general. [2] Creating communications solutions on the web. How can I be of service to you?

Mallorca, Spain Katılım Aralık 2008

156 Takip Edilen54 Takipçiler

Richard Taubo@WolfHumble·14h

@marcohmann Perfect, thanks so much! 😊 You said it was pretty linear, so when you got 66 TPS, the input size and TTFT was about: * Input size: about 40,104 tokens * Estimated TTFT: about 3.95s Thanks again!

English

Marc Ohmann@marcohmann·1d

@WolfHumble Enjoy! TTFT Benchmark: qwen3.5-122b-fp8 Testing 128,000 tokens... Input size: 492,798 chars (~123,199 tokens) TTFT: 10035ms (10.035s) Throughput: 9.9 tok/s Pretty linear slowdown with 1k tokens at 1.08s and 92.4 TPS

English

Marc Ohmann@marcohmann·3d

Qwen3.5 122B-FP8 on 2x RTX 6000 Blackwells Got stuck in endless loops with no weights loading until I turned off NBIO IOMMU in bios. Now 66 TPS!

English

Richard Taubo@WolfHumble·2d

@cooperx86 @stevibe Thanks for the information! If you have the time, what are the numbers for 128K prompts? Thanks either if you have time or not! 😊

English

Peter Cooper@cooperx86·2d

@stevibe I did a similar comparison with the Q3 on my 3090 Ti and Mac Studio: Tiny prompt: 3090 Ti – 133 tok/s, TTFT 130ms Mac Studio M3 – 71 tok/s, TTFT 310ms 66K prompt: 3090 Ti – 108 tok/s, TTFT 30.4s Mac Studio M3 – 56 tok/s, TTFT 56.6s

English

664

stevibe@stevibe·2d

Qwen3.6 35B-A3B dropped yesterday, so I ran it on 4 GPUs to see how it performs: 🟣 RTX 3090 — 49.78 tok/s, TTFT 852ms 🟡 RTX 4090 — 118.93 tok/s, TTFT 686ms 🟢 RTX 5090 — 160.37 tok/s, TTFT 409ms 🔵 DGX Spark — 59.98 tok/s, TTFT 228ms I went with ollama as the backend because honestly, it's the easiest way for most people to get started. One command, model pulled, done. I used Q4_K_M (24GB) across all four cards. The reason is the 3090 and 4090 don't support NVFP4 (only the 5090 and DGX Spark could use it). Keeping the same quant everywhere felt like the fairest way to compare. And yes, you can absolutely squeeze more performance out of every card with vLLM, SGLang, or TensorRT-LLM. But that's not what this test is about. This is just the out-of-the-box experience for folks who own a GPU and want to try the new model tonight.

English

150

267

2.3K

377.4K

Richard Taubo@WolfHumble·2d

@The_Only_Signal Love the setup and the video! 🙌 Just wonder about one thing, Pcie with 128GB/S is really fast, but won't it be relatively slow for MoE models over 96GB where you need to shard between cards? Would like to get your thoughts on that. Thanks!

English

Mike Bradley@The_Only_Signal·3d

Appreciate the little things in life, like how pretty this board is.

English

3.3K

Richard Taubo@WolfHumble·3d

Vel, det er vel bredt politisk ønsket at folk skal eie sin egen bolig, og det gjør 81,5% ifølge SSB i 2024. I følge tabellen ovenfor var boutgiftbelastningen i % for leiere i perioden 2011-2025 nest minst i 2025, kun slått av 2012. Når det gjelder politikken, hadde det selvsagt vært mest rettferdig om både leiere og eiere fikk fradrag, men da hadde man ikke lenger hatt samme intensiv til å kjøpe sin egen bolig. En hybrid med noe mindre fradrag for leiere kunne blitt foreslått, men det hadde sikkert blitt oppfattet som urettferdig. Og så kan man gå samme veien som f.eks. Spania der ingen får fradrag verken eiere eller leiere, og der vil jeg tro at de som er på venstresiden vil være mer positive enn de på høyresiden i politikken.

Norsk

Tord@tordrt·4d

@WolfHumble @PepsiGro Det er vel lite synd på selveiere.. Selveiere I Norge blir subsidiert med skattefordeler Leiere ikke ser en snurt til.

Norsk

Pepsigro@PepsiGro·4d

Det går faktisk ganske bra i Norge ssb.no/bygg-bolig-og-…

Norsk

2.3K

Richard Taubo@WolfHumble·4d

@olehelgesen7 @ktlmld Kan meget godt tenkes at de er bedre på militærstrategi. Men kan også skyldes Gell-Mann-amnesieeffekten: Man ser at media tar feil i et felt du kan, men likevel stoler på dem i andre temaer som f.eks. militærstrategi i Ukraina. 🤷‍♂️

Norsk

Ole Ketil Helgesen@olehelgesen7·4d

@ktlmld NRK og andre medier bør legge litt mer innsats i å finne folk som faktisk bidrar til opplysning. Synes de har blitt ganske gode når det gjelder militærstrategi i Ukraina. Innen energi er de elendige.

Norsk

589

Ole Ketil Helgesen@olehelgesen7·4d

NRKs "ekspert" om olje: "Landene som er avhengige av olje og gass i dag, vil se på noe helt annet. Solenergi for eksempel. Tilgang på sola er for alle og er også en billig energikilde, sier hun." Barnslig, banalt og fundamentalt feil. Forts. 1/

Norsk

6.7K

Richard Taubo@WolfHumble·5d

@goodhunt Yay! 🙌

Hunter Bown@goodhunt·5d

Today was a great day

English

Richard Taubo@WolfHumble·5d

@PepsiGro Based on China’s current oil consumption, it would last just over 2 months. Not very much . . .

English

Pepsigro@PepsiGro·5d

The US is hoping its blockade will cause oil-hungry China to pressure Iran to come to the negotiation table, but China has reserves of more than a billion barrels bloomberg.com/opinion/newsle…

English

574

Richard Taubo@WolfHumble·6d

Well, hopefully that will be the case. Sadly, I am less optimistic about GNU/MIT style open weight licenses than I was before, even for personal business use. There are no guarantee that these companies won't just pull the plug, and then local models are just basically game over. Very fragile system right now. The only viable longterm solution I see for Local AI, is that a big player REALLY commits to open weight, or if some sort of consortium is created. But I guess the latter will be toothless without heavy economic support. See more thoughts about such a consortium here: x.com/natolambert/st…

English

Jeffrey 杰弗瑞@tomcocobrico·6d

@WolfHumble @iotcoi they just clarified its mostly due to hosting firms having bad quants / configs and makes minimax look bad. Might be fully viable with self hosting, they want to adjust the license to clear things up x.com/RyanLeeMiniMax…

RyanLee@RyanLeeMiniMax

x.com/i/article/2043…

English

Mitko Vasilev@iotcoi·6d

I ran MiniMax-M2.7 for 24h on my GB10 DevBox It's an Opus-Sonnet moonshine distill with the rate limits surgically removed On a random Monday billing cycle, it's a 1:1 doppelgänger of Anthropic's coding plan Zero "high traffic" gaslighting and REAP is loading

English

122

10.7K

Richard Taubo@WolfHumble·6d

That is sadly not what their license says right now, even though that might be their intention. They also mention in other X messages that they will change the license text so that this type of use will be allowed. But it hasn't been changed yet (the last message I saw on this was timestamped 6 hours ago).

English

Mitko Vasilev@iotcoi·6d

@tomcocobrico @WolfHumble This is what I understood too- for personal hosting, it is OK for commercial work. You can't resell it, as inferencing without paying them. It's a fair deal

English

Richard Taubo@WolfHumble·13 Nis

According to the license: "Non-commercial use permitted based on MIT-style terms; commercial use requires prior written authorisation. So this a completely un-interesting model if it touches a business in any way unless you get an authorisation first and I don't think they will give out these authorisations very easily.

English

Ahmad@TheAhmadOsman·13 Nis

Ran some benchmarks on different context lengths all the way to 180k Findings below

Ahmad@TheAhmadOsman

MiniMax M2.7 Benchmarks on 4x DGX Spark + vLLM - 45k / 110k / 178k requests, 1.63k prefill + 14.30 decode token/sec - 25k / 49k / 74k requests, 2.52k prefill + 23.38 decode token/sec - 4k / 8k / 16k requests, 3.45k prefill + 33.44 decode token/sec What else do you wanna see?

English

3.5K

Ahmad@TheAhmadOsman·13 Nis

MiniMax M2.7 at home running on 4x DGX Sparks vLLM serving full BF16 weights, 200k context OpenCode having the model monitor its own hardware and report thermals, tokens/sec, TTFT, and other runtime stats in real time What benchmarks / workflows / things do you wanna see next?

MiniMax (official)@MiniMax_AI

We're delighted to announce that MiniMax M2.7 is now officially open source. With SOTA performance in SWE-Pro (56.22%) and Terminal Bench 2 (57.0%). You can find it on Hugging Face now. Enjoy!🤗 huggingface：huggingface.co/MiniMaxAI/Mini… Blog: minimax.io/news/minimax-m… MiniMax API: platform.minimax.io

English

381

89.4K

Richard Taubo@WolfHumble·10 Nis

@mmjukic 'Armageddon' from the Book of Revelation is linked to Megiddo in the Plain of Jezreel. I always have that in the back of my mind when “all” Arab countries seem to want to attack Israel and Israel is attacking back. It feels more spiritual than logical to me.

English

Marko Jukic@mmjukic·10 Nis

Why does that sound identical to America's foreign policy problem x.com/EmissaryofZuul…

EmissaryofZuul@EmissaryofZuul2

@mmjukic Current Israeli government is just stuck in the 90s when it comes to the power balance and seeking maximalist irredentist goals for shortterm domestic political reasons

English

2.8K

Marko Jukic@mmjukic·10 Nis

For reasons I do not need to go over, everyone is convinced of Israel's supreme rationality, intelligence, and 15-D chess no matter what it does. Few are willing to consider that Israel is just pursuing bad geopolitical and military strategy because of its own blind spots.

Spandrell@spandrell4

If this is true what I don't get it is Israeli insistence in continuing the war

English

256

17.9K

Richard Taubo@WolfHumble·2 Nis

@TheAhmadOsman Not a sustainable future for LocalAI if one has to beg every time a new model comes out. Does not look like anything we have received from the open source Linux community. What a blessing that has been, and we have (almost) taken it for granted! Thanks Linux/Gnu 🤝🩵

English

126

Ahmad@TheAhmadOsman·2 Nis

Maybe we’ll be getting the Qwen 3.6 weights after all (this guy works on the Qwen team)

Yucheng Li@liyucheng_2

@TheAhmadOsman @ChujieZheng Best time to by a gpu preparing for the comming Qwen3.6 😄

English

134

8.6K

Richard Taubo@WolfHumble·1 Nis

Bought a Coca Cola in a kiosk in Barcelona the other day and it didn't have the leashed cap. I was immediately puzzled: Must be really old, should I drink it? Then I saw the label and understood that it probably was smuggled in from a country outside of Europe, probably for tax purposes. Wouldn't have bought it if I knew, but was already drinking it before I realised, so I just finished it. Never thought these caps could be used to determine origin. 😊

English

Floro S.@sflorimm·1 Nis

@inc_ongi this one was brutal.

English

103

38K

Floro S.@sflorimm·1 Nis

USA has ChatGPT USA has Grok USA has Claude USA has Gemini USA has Llama USA has Copilot China has DeepSeek China has Qwen China has Ernie China has GLM China has Kimi China has MiniMax Europe has?

Español

8.6K

706

2.1M

Richard Taubo@WolfHumble·1 Nis

Just my 2 cents: if you compare running 3 eGPUs over Thunderbolt 5 vs a Mac with e.g. 512GB unified memory, the big difference is where the model lives. With 3 eGPUs, it has to move between cards, which adds slowdown. With 512GB unified memory, it stays in one big pool, which is usually the cleaner and faster way to run large models locally.

English

342

Lukas Kawerau@LukasKawerau·1 Nis

@__tinygrad__ Since the mac mini has three thunderbolt ports, could I add three cards and run bigger models across them?

English

6.1K

the tiny corp@__tinygrad__·1 Nis

If you have a Thunderbolt or USB4 eGPU and a Mac, today is the day you've been waiting for! Apple finally approved our driver for both AMD and NVIDIA. It's so easy to install now a Qwen could do it, then it can run that Qwen...

English

271

7.6K

1.5M

Richard Taubo@WolfHumble·31 Mar

@RealTjDunham @NotLikeBrick Childish.

English

Tj Dunham@RealTjDunham·31 Mar

@NotLikeBrick the message was, "doesnt matter how much you try and cheat us i won anyways" cold..

English

333

18.6K

LeQuinton@NotLikeBrick·30 Mar

I’m in tears he was searching for that man as soon as the shot went in 😭😭

English

286

2.8K

46.6K

Richard Taubo@WolfHumble·25 Mar

"– Det er grunn til å ha is i magen, og se om det blir så ille at det trengs krisetiltak. Disse foreslåtte tiltakene vil også sende regningen til forbrukere i andre land, som ikke har råd til slike tiltak." iht. Harald Magnus Andreassen i e24.no artikkel. Er det ikke snart på tide å tenkte på effekten av våre handlinger i alle verdens land? 😉

Norsk

803

Jan L. Andreassen@makroblogger·25 Mar

Folk må roe ned når det gjelder 4 mrd ekstra til en næring som har lidd unødig som følge av økonomisk politikk + krig. 4 mrd kr er mindre enn en promille av FL-Norge sitt BNP 😂

Norsk

11.1K

Richard Taubo@WolfHumble·20 Mar

Gjorde en oppsummering fra møtet via ChatGPT thinking 5.4 (først konvertert til tekst). Spørsmålet og svar du viser til etter 58.26 er vel besvart i punkt 3 nedenfor: 1) Indeksfond er ikke så passive som de ser ut. Grunn: De kjøper automatisk mest av selskapene som allerede har steget og fått høyest vekt i indeksen. Kan føre til: At de største aksjene presses videre opp, selv uten tilsvarende forbedring i selskapenes økonomi. Resultat: Mer momentum, høyere konsentrasjon og større risiko for feilprising. 2) Ny teknologi er ikke det samme som en god investering. Grunn: En teknologi kan endre verden, men investeringene kan fortsatt bli for dyre og avkastningen for svak. Kan føre til: At investorer betaler for mye for vekst, håp og framtidige gevinster som kanskje ikke kommer. Resultat: Lav avkastning selv om teknologien faktisk lykkes. 3) Ikke forlat strategien din bare fordi markedet belønner noe annet nå. Grunn: Markedet kan i lange perioder drive opp det som er populært, ikke nødvendigvis det som er mest fornuftig priset. Kan føre til: At du jager det som nylig har steget og gir opp disiplinen din på feil tidspunkt. Resultat: Dårlig timing, svakere beslutninger og lavere avkastning over tid.

Norsk

266

Thomas Nielsen@Th__Nielsen·20 Mar

Blei tipsa om videoen til Fundsmith sitt årsmøte. Har alltid sett opp til Terry Smith (forvalter av Fundsmith) og sjøl om ikke de er regelstyrte på samme måte som Veritas så er fokus på kvalitet og tall mye overlappende. Fundsmith har nå hatt mindreavkastning hvert år de siste 5 åra, og Terry fremsto kanskje noe mer jordnær/ydmyk enn jeg har sett han tidligere. Hele videoen er halvannen time og jeg synes hele er verdt å se, men rent personlig var spørsmålet & svaret etter 58.26 noe jeg tidvis reflekterer over sjøl også youtube.com/watch?v=W8vYea…

YouTube

Norsk

9.8K

Keşfet

@marcohmann @cooperx86 @stevibe @The_Only_Signal @PepsiGro @olehelgesen7 @ktlmld @goodhunt