Richard Taubo

316 posts

Richard Taubo

Richard Taubo

@WolfHumble

These are things that make me happy: [1] Helping people in general. [2] Creating communications solutions on the web. How can I be of service to you?

Mallorca, Spain Katılım Aralık 2008
156 Takip Edilen54 Takipçiler
Richard Taubo
Richard Taubo@WolfHumble·
@marcohmann Perfect, thanks so much! 😊 You said it was pretty linear, so when you got 66 TPS, the input size and TTFT was about: * Input size: about 40,104 tokens * Estimated TTFT: about 3.95s Thanks again!
English
0
0
0
8
Marc Ohmann
Marc Ohmann@marcohmann·
@WolfHumble Enjoy! TTFT Benchmark: qwen3.5-122b-fp8 Testing 128,000 tokens... Input size: 492,798 chars (~123,199 tokens) TTFT: 10035ms (10.035s) Throughput: 9.9 tok/s Pretty linear slowdown with 1k tokens at 1.08s and 92.4 TPS
English
1
0
1
12
Marc Ohmann
Marc Ohmann@marcohmann·
Qwen3.5 122B-FP8 on 2x RTX 6000 Blackwells Got stuck in endless loops with no weights loading until I turned off NBIO IOMMU in bios. Now 66 TPS!
Marc Ohmann tweet media
English
1
0
1
41
Richard Taubo
Richard Taubo@WolfHumble·
@cooperx86 @stevibe Thanks for the information! If you have the time, what are the numbers for 128K prompts? Thanks either if you have time or not! 😊
English
0
0
0
16
Peter Cooper
Peter Cooper@cooperx86·
@stevibe I did a similar comparison with the Q3 on my 3090 Ti and Mac Studio: Tiny prompt: 3090 Ti – 133 tok/s, TTFT 130ms Mac Studio M3 – 71 tok/s, TTFT 310ms 66K prompt: 3090 Ti – 108 tok/s, TTFT 30.4s Mac Studio M3 – 56 tok/s, TTFT 56.6s
English
1
0
5
664
stevibe
stevibe@stevibe·
Qwen3.6 35B-A3B dropped yesterday, so I ran it on 4 GPUs to see how it performs: 🟣 RTX 3090 — 49.78 tok/s, TTFT 852ms 🟡 RTX 4090 — 118.93 tok/s, TTFT 686ms 🟢 RTX 5090 — 160.37 tok/s, TTFT 409ms 🔵 DGX Spark — 59.98 tok/s, TTFT 228ms I went with ollama as the backend because honestly, it's the easiest way for most people to get started. One command, model pulled, done. I used Q4_K_M (24GB) across all four cards. The reason is the 3090 and 4090 don't support NVFP4 (only the 5090 and DGX Spark could use it). Keeping the same quant everywhere felt like the fairest way to compare. And yes, you can absolutely squeeze more performance out of every card with vLLM, SGLang, or TensorRT-LLM. But that's not what this test is about. This is just the out-of-the-box experience for folks who own a GPU and want to try the new model tonight.
English
150
267
2.3K
377.4K
Richard Taubo
Richard Taubo@WolfHumble·
@The_Only_Signal Love the setup and the video! 🙌 Just wonder about one thing, Pcie with 128GB/S is really fast, but won't it be relatively slow for MoE models over 96GB where you need to shard between cards? Would like to get your thoughts on that. Thanks!
English
0
0
0
69
Mike Bradley
Mike Bradley@The_Only_Signal·
Appreciate the little things in life, like how pretty this board is.
Mike Bradley tweet media
English
18
4
96
3.3K
Richard Taubo
Richard Taubo@WolfHumble·
Vel, det er vel bredt politisk ønsket at folk skal eie sin egen bolig, og det gjør 81,5% ifølge SSB i 2024. I følge tabellen ovenfor var boutgiftbelastningen i % for leiere i perioden 2011-2025 nest minst i 2025, kun slått av 2012. Når det gjelder politikken, hadde det selvsagt vært mest rettferdig om både leiere og eiere fikk fradrag, men da hadde man ikke lenger hatt samme intensiv til å kjøpe sin egen bolig. En hybrid med noe mindre fradrag for leiere kunne blitt foreslått, men det hadde sikkert blitt oppfattet som urettferdig. Og så kan man gå samme veien som f.eks. Spania der ingen får fradrag verken eiere eller leiere, og der vil jeg tro at de som er på venstresiden vil være mer positive enn de på høyresiden i politikken.
Norsk
1
0
1
36
Tord
Tord@tordrt·
@WolfHumble @PepsiGro Det er vel lite synd på selveiere.. Selveiere I Norge blir subsidiert med skattefordeler Leiere ikke ser en snurt til.
Norsk
1
0
0
23
Richard Taubo
Richard Taubo@WolfHumble·
@olehelgesen7 @ktlmld Kan meget godt tenkes at de er bedre på militærstrategi. Men kan også skyldes Gell-Mann-amnesieeffekten: Man ser at media tar feil i et felt du kan, men likevel stoler på dem i andre temaer som f.eks. militærstrategi i Ukraina. 🤷‍♂️
Norsk
0
0
0
20
Ole Ketil Helgesen
Ole Ketil Helgesen@olehelgesen7·
@ktlmld NRK og andre medier bør legge litt mer innsats i å finne folk som faktisk bidrar til opplysning. Synes de har blitt ganske gode når det gjelder militærstrategi i Ukraina. Innen energi er de elendige.
Norsk
1
0
4
589
Ole Ketil Helgesen
Ole Ketil Helgesen@olehelgesen7·
NRKs "ekspert" om olje: "Landene som er avhengige av olje og gass i dag, vil se på noe helt annet. Solenergi for eksempel. Tilgang på sola er for alle og er også en billig energikilde, sier hun." Barnslig, banalt og fundamentalt feil. Forts. 1/
Norsk
4
3
88
6.7K
Hunter Bown
Hunter Bown@goodhunt·
Today was a great day
English
1
0
0
29
Richard Taubo
Richard Taubo@WolfHumble·
@PepsiGro Based on China’s current oil consumption, it would last just over 2 months. Not very much . . .
English
0
0
0
21
Pepsigro
Pepsigro@PepsiGro·
The US is hoping its blockade will cause oil-hungry China to pressure Iran to come to the negotiation table, but China has reserves of more than a billion barrels bloomberg.com/opinion/newsle…
English
1
0
1
574
Richard Taubo
Richard Taubo@WolfHumble·
Well, hopefully that will be the case. Sadly, I am less optimistic about GNU/MIT style open weight licenses than I was before, even for personal business use. There are no guarantee that these companies won't just pull the plug, and then local models are just basically game over. Very fragile system right now. The only viable longterm solution I see for Local AI, is that a big player REALLY commits to open weight, or if some sort of consortium is created. But I guess the latter will be toothless without heavy economic support. See more thoughts about such a consortium here: x.com/natolambert/st…
English
0
0
0
11
Mitko Vasilev
Mitko Vasilev@iotcoi·
I ran MiniMax-M2.7 for 24h on my GB10 DevBox It's an Opus-Sonnet moonshine distill with the rate limits surgically removed On a random Monday billing cycle, it's a 1:1 doppelgänger of Anthropic's coding plan Zero "high traffic" gaslighting and REAP is loading
Mitko Vasilev tweet media
English
13
4
122
10.7K
Richard Taubo
Richard Taubo@WolfHumble·
That is sadly not what their license says right now, even though that might be their intention. They also mention in other X messages that they will change the license text so that this type of use will be allowed. But it hasn't been changed yet (the last message I saw on this was timestamped 6 hours ago).
English
0
0
1
25
Mitko Vasilev
Mitko Vasilev@iotcoi·
@tomcocobrico @WolfHumble This is what I understood too- for personal hosting, it is OK for commercial work. You can't resell it, as inferencing without paying them. It's a fair deal
English
1
0
1
41
Richard Taubo
Richard Taubo@WolfHumble·
According to the license: "Non-commercial use permitted based on MIT-style terms; commercial use requires prior written authorisation. So this a completely un-interesting model if it touches a business in any way unless you get an authorisation first and I don't think they will give out these authorisations very easily.
English
0
0
0
48
Ahmad
Ahmad@TheAhmadOsman·
MiniMax M2.7 at home running on 4x DGX Sparks vLLM serving full BF16 weights, 200k context OpenCode having the model monitor its own hardware and report thermals, tokens/sec, TTFT, and other runtime stats in real time What benchmarks / workflows / things do you wanna see next?
MiniMax (official)@MiniMax_AI

We're delighted to announce that MiniMax M2.7 is now officially open source. With SOTA performance in SWE-Pro (56.22%) and Terminal Bench 2 (57.0%). You can find it on Hugging Face now. Enjoy!🤗 huggingface:huggingface.co/MiniMaxAI/Mini… Blog: minimax.io/news/minimax-m… MiniMax API: platform.minimax.io

English
39
27
381
89.4K
Richard Taubo
Richard Taubo@WolfHumble·
@mmjukic 'Armageddon' from the Book of Revelation is linked to Megiddo in the Plain of Jezreel. I always have that in the back of my mind when “all” Arab countries seem to want to attack Israel and Israel is attacking back. It feels more spiritual than logical to me.
English
0
0
0
25
Richard Taubo
Richard Taubo@WolfHumble·
@TheAhmadOsman Not a sustainable future for LocalAI if one has to beg every time a new model comes out. Does not look like anything we have received from the open source Linux community. What a blessing that has been, and we have (almost) taken it for granted! Thanks Linux/Gnu 🤝🩵
English
0
0
1
126
Richard Taubo
Richard Taubo@WolfHumble·
Bought a Coca Cola in a kiosk in Barcelona the other day and it didn't have the leashed cap. I was immediately puzzled: Must be really old, should I drink it? Then I saw the label and understood that it probably was smuggled in from a country outside of Europe, probably for tax purposes. Wouldn't have bought it if I knew, but was already drinking it before I realised, so I just finished it. Never thought these caps could be used to determine origin. 😊
English
0
0
0
99
Floro S.
Floro S.@sflorimm·
USA has ChatGPT USA has Grok USA has Claude USA has Gemini USA has Llama USA has Copilot China has DeepSeek China has Qwen China has Ernie China has GLM China has Kimi China has MiniMax Europe has?
Español
8.6K
706
9K
2.1M
Richard Taubo
Richard Taubo@WolfHumble·
Just my 2 cents: if you compare running 3 eGPUs over Thunderbolt 5 vs a Mac with e.g. 512GB unified memory, the big difference is where the model lives. With 3 eGPUs, it has to move between cards, which adds slowdown. With 512GB unified memory, it stays in one big pool, which is usually the cleaner and faster way to run large models locally.
English
0
0
1
342
Lukas Kawerau
Lukas Kawerau@LukasKawerau·
@__tinygrad__ Since the mac mini has three thunderbolt ports, could I add three cards and run bigger models across them?
English
3
0
9
6.1K
the tiny corp
the tiny corp@__tinygrad__·
If you have a Thunderbolt or USB4 eGPU and a Mac, today is the day you've been waiting for! Apple finally approved our driver for both AMD and NVIDIA. It's so easy to install now a Qwen could do it, then it can run that Qwen...
the tiny corp tweet media
English
271
1K
7.6K
1.5M
Tj Dunham
Tj Dunham@RealTjDunham·
@NotLikeBrick the message was, "doesnt matter how much you try and cheat us i won anyways" cold..
English
2
0
333
18.6K
LeQuinton
LeQuinton@NotLikeBrick·
I’m in tears he was searching for that man as soon as the shot went in 😭😭
English
286
2.8K
46.6K
3M
Richard Taubo
Richard Taubo@WolfHumble·
"– Det er grunn til å ha is i magen, og se om det blir så ille at det trengs krisetiltak. Disse foreslåtte tiltakene vil også sende regningen til forbrukere i andre land, som ikke har råd til slike tiltak." iht. Harald Magnus Andreassen i e24.no artikkel. Er det ikke snart på tide å tenkte på effekten av våre handlinger i alle verdens land? 😉
Norsk
1
0
2
803
Jan L. Andreassen
Jan L. Andreassen@makroblogger·
Folk må roe ned når det gjelder 4 mrd ekstra til en næring som har lidd unødig som følge av økonomisk politikk + krig. 4 mrd kr er mindre enn en promille av FL-Norge sitt BNP 😂
Norsk
7
2
53
11.1K
Richard Taubo
Richard Taubo@WolfHumble·
Gjorde en oppsummering fra møtet via ChatGPT thinking 5.4 (først konvertert til tekst). Spørsmålet og svar du viser til etter 58.26 er vel besvart i punkt 3 nedenfor: 1) Indeksfond er ikke så passive som de ser ut. Grunn: De kjøper automatisk mest av selskapene som allerede har steget og fått høyest vekt i indeksen. Kan føre til: At de største aksjene presses videre opp, selv uten tilsvarende forbedring i selskapenes økonomi. Resultat: Mer momentum, høyere konsentrasjon og større risiko for feilprising. 2) Ny teknologi er ikke det samme som en god investering. Grunn: En teknologi kan endre verden, men investeringene kan fortsatt bli for dyre og avkastningen for svak. Kan føre til: At investorer betaler for mye for vekst, håp og framtidige gevinster som kanskje ikke kommer. Resultat: Lav avkastning selv om teknologien faktisk lykkes. 3) Ikke forlat strategien din bare fordi markedet belønner noe annet nå. Grunn: Markedet kan i lange perioder drive opp det som er populært, ikke nødvendigvis det som er mest fornuftig priset. Kan føre til: At du jager det som nylig har steget og gir opp disiplinen din på feil tidspunkt. Resultat: Dårlig timing, svakere beslutninger og lavere avkastning over tid.
Norsk
1
0
4
266
Thomas Nielsen
Thomas Nielsen@Th__Nielsen·
Blei tipsa om videoen til Fundsmith sitt årsmøte. Har alltid sett opp til Terry Smith (forvalter av Fundsmith) og sjøl om ikke de er regelstyrte på samme måte som Veritas så er fokus på kvalitet og tall mye overlappende. Fundsmith har nå hatt mindreavkastning hvert år de siste 5 åra, og Terry fremsto kanskje noe mer jordnær/ydmyk enn jeg har sett han tidligere. Hele videoen er halvannen time og jeg synes hele er verdt å se, men rent personlig var spørsmålet & svaret etter 58.26 noe jeg tidvis reflekterer over sjøl også youtube.com/watch?v=W8vYea…
YouTube video
YouTube
Norsk
4
0
22
9.8K