Chris Griffin

123 posts

Chris Griffin

@csgriff_

Helping LLM’s to predict the future. Founder. Company not linked here so banter is possible

London, England Katılım Şubat 2024

382 Takip Edilen36 Takipçiler

Chris Griffin@csgriff_·9h

@natolambert “Within 12 months AI may make up 30% of a F500’s workforce, but within 30 years Nathan Lambert will never be on this panel

English

Nathan Lambert@natolambert·1d

Any good quotes on the Nvidia GTC open models panel? Maybe they'll invite me to one some day 🥺

English

10.1K

Chris Griffin@csgriff_·11h

@llm_wizard

QME

Chris 🇨🇦@llm_wizard·14h

And they call me…. Joe Nemotron.

English

3.2K

Chris Griffin@csgriff_·1d

@KyleHessling1 Wow, OK is interesting

English

Kyle Hessling@KyleHessling1·1d

@csgriff_ I think I found the answer. Just turn thinking off! I thought it would be a big performance hit, but it doesn't seem that big, gap is even smaller for the Coding index. Will test and report. The one on the right is non-thinking, still beats the brand-new Nemotron!

English

Kyle Hessling@KyleHessling1·1d

So I have been hitting Hermes agent and Local Qwen 27B hard on the 5090, doing some Apple Swift development, I am trying to get a novel camera app idea built entirely on local compute with 27B just to show it is possible. It's working, but it definitely feels like I'm in the Sonnet 3.7 days! Will put it on the appstore when done as a proof of capability. Happy to report I have the app working, but cleaning up bugs is taking forever for the following reason: Perhaps @sudoingX can help me here: At longer contexts, the model decides to think for literally 20-30 minutes. It's snappy at shorter contexts sub 30k, longer context token speed is still in the high 50's, even at 75k context, but it just sits and thinks forever with every prompt after 60k, almost like there's a hard point where it changes to forever-think there. I guess I could /compact more often, but I have so much headroom! Maybe a problem with the Q4_K_M quant, and I should try Q5? I am also trying to get a Minimax M2.5 REAP running at a faster speed locally. Tried the experimental GreenBoost for an entire day, but didn't see any significant improvement as that method is in its infancy (promising for the future though no doubt) switched back to split inference, but I'm getting wrecked by the CPU/RAM expert shuffling bottleneck, barely using my 5090 and getting about 14 tps with a ton of unquantized context. Going to try ik_lamma.ccp today! Not gonna lie, a 128GB Mac Studio or DGX Spark is looking like a tempting alternative to get these big MOE REAPS running at a lower price than an RTX 6000. The bummer is even though the RAM and compute are way weaker there's no expert shuffling bottleneck...But I do love the granular experimentation and levers to pull with Nvidia GPUs. @TheAhmadOsman speak some sense into me please!

English

150

Chris Griffin@csgriff_·1d

@rohitdotmittal @garrytan @perplexity_ai Agreed Computer was a really pleasant surprise in a sea of claims that don’t stand up to real world use. Although when it said “this will use a very large amount of your credits” I did have a moment of panic thinking I was running up a nasty bill

English

Rohit Mittal@rohitdotmittal·2d

Perplexity Computer wins hands down over Claude Code and Codex, even with their latest versions. The outputs are 5x-10x better for some use cases. @perplexity_ai - can you please launch a desktop app and a CLI?

English

420

66.8K

Chris Griffin@csgriff_·1d

@KuittinenPetri @666Sebo @somet3chth1ng @sudoingX Table is super interesting thanks, I can see why synthetic data is useful, but doesn’t sound ideal to give a model world knowledge. Unsurprisingly I find dense models much better for understanding how countries work. Unfortunately clients do not like Chinese models one bit

English

Petri Kuittinen@KuittinenPetri·1d

I like Nvidia Nemotron 3 family more than gpt-oss-120, even though Nemotron is trained pretty much all on synthetic data, and that seems the case for gpt-oss family as well (but "Open"AI doesn't reveal what were the training sets). Qwen3.5 clearly has some real data as well and probably at least twice amount of tokens used to train, thus it has some taste of the real world as well.

English

Sudo su@sudoingX·2d

local AI hardware tiers: $4,699 - DGX Spark (NVIDIA wants you here) $1,989 - RTX 4090 (overkill for most) $1000 - RTX 3090 used (sweet spot) $250 - RTX 3060 used (currently testing every model that fits 12GB) $0 - CPU only (it still works) jensen announced the top. i've been posting receipts from the bottom.

English

100

554

34.9K

Chris Griffin@csgriff_·1d

@KuittinenPetri @666Sebo @somet3chth1ng @sudoingX Yeah it’s a shame, I auto query 200k times a day, primarily on current affairs and geopol, Qwen has always felt solid, while gpt-oss, Nvidia kind of feel ‘flat’ difficult to describe exactly, but like they are less worldly

English

Petri Kuittinen@KuittinenPetri·1d

I didn't. When it comes to multi-lingual support Qwen3.5-27B is much better than gtp-oss-120b, same for coding. I never liked the gpt-oss family, too much trained with synthetic data vs real life quality data and only good in narrow STEM topics, coding, bad in everything else, especially creative writing. I should try latest mistral small 4 (199B-A6B) and Nemotron 3 Super (120B-A12B). But I already deleted gpt-oss models from all my computers. I usually keep only ~20 models per computer and delete those which don't work for me.

English

Chris Griffin@csgriff_·1d

@TheAhmadOsman @NVIDIAAIDev @karpathy @DellTech Mac Studio is clearly superior Ahmad, why are you wasting your time 🤷‍♂️

English

Ahmad@TheAhmadOsman·1d

@NVIDIAAIDev @karpathy @DellTech I think my next goal is to get a Jensen signed DGX Station GB300 👀 x.com/theahmadosman/…

Ahmad@TheAhmadOsman

me and my pal Jensen

English

143

13.9K

NVIDIA AI Developer@NVIDIAAIDev·1d

🙌 Andrej Karpathy’s lab has received the first DGX Station GB300 -- a Dell Pro Max with GB300. 💚 We can't wait to see what you’ll create @karpathy! 🔗 #dgx-station" target="_blank" rel="nofollow noopener">blogs.nvidia.com/blog/gtc-2026-… @DellTech

English

120

267

4.2K

1.2M

Chris Griffin@csgriff_·2d

@Andr3jH Gets my vote for best tweet of 2026

English

165

ᐱ ᑎ ᑐ ᒋ ᕮ ᒍ@Andr3jH·3d

"...those models have been extracted. It's called a distillation attack, Eli. I have unfettered access to your model so I generate millions of exchanges and use the outputs as training data" "No, no, no, this is Claude, do you understand?" "Do you understand, Eli? That's more to the point. Do you understand? I eat your data. I eat your compute. I eat it all up"

English

836

51.7K

Chris Griffin@csgriff_·2d

@Yuchenj_UW They won’t, this is people who don’t understand the world getting very over confident. If they ever actually make something crazy powerful it will be taken out of their hands

English

Yuchen Jin@Yuchenj_UW·2d

Some people at frontier AI labs told me they believe startups are over. OpenAI, Anthropic, Google, xAI will absorb every industry as AGI nears. Coding today, science, medicine, and finance next. Then everything else. If they’re right, that’s a pretty boring end of the world.

English

537

162

880.8K

Chris Griffin@csgriff_·2d

@BuildScaleLead @staysaasy Finally, the actual answer. “No one gets fired for buying IBM”, they defo do for switching important apps across to a couple of socially awkward randoms, get back in your box

English

126

Ryan Marsh@BuildScaleLead·2d

@staysaasy Sales and marketing are undefeated. Nerds keep entering the ring and getting cold cocked by the reality of business.

English

2.3K

staysaasy@staysaasy·2d

Re: SaaS death - I actually know of two separate SaaS companies that had employees leave in the last two years to build competitors and in both cases the competitive products are now dead, with zero traction. And the people that left those companies were very, very smart. And the products they built were the same shape as the companies they left, and they used AI to build them. But they had absolutely 0 success.

English

502

108.9K

Chris Griffin@csgriff_·2d

@AlexKi1993 @TheAhmadOsman Legend. Thank you, and that looks awesome

English

AlexK@AlexKi1993·2d

@csgriff_ @TheAhmadOsman Sparks will have the better performance but setup of the cluster is a lot more complicated.

English

103

Ahmad@TheAhmadOsman·2d

NVIDIA now officially supports 4x DGX Sparks btw

English

387

18.5K

Chris Griffin@csgriff_·2d

@TheAhmadOsman Cluster: 1 x Nvidia spark, 3 x Framework PC

English

Ahmad@TheAhmadOsman·2d

the state of AI

English

2.6K

Chris Griffin@csgriff_·3d

@TheAhmadOsman Ahmad, serious question, if Mac Studio 512gb (discontinued) isn’t the best option for a prosumer trying to run models of this size class, what is?

English

370

Ahmad@TheAhmadOsman·3d

INCREDIBLE STUFF INCOMING Nemotron 3 Ultra Base (~500B) benchmarks against Kimi K2 and GLM looking goood

English

845

79.8K

Chris Griffin@csgriff_·3d

@llm_wizard Chris remember the wisdom that came to you in a dream. Nvidia must make a dense model approx 30b, it is very important, woooooo

English

Chris 🇨🇦@llm_wizard·3d

NEMOTRON MENTIONED

English

1.7K

Chris Griffin@csgriff_·3d

@DeepInfra @nvidia Is the reason you haven’t host qwen3.5 models yet because you are trying to work out how to deal with the insane reasoning length?

English

DeepInfra@DeepInfra·4d

@NVIDIA GTC starts tomorrow. If you're in San Jose this week - come find us at Booth #4022. Happy to talk models, inference, Blackwell optimizations, or anything AI. See you there!

English

306

Chris Griffin@csgriff_·6d

@gbertb @lupi_arsene @TeksEdge I would argue it is serious for high volume automated queries. I put approx 100k a day through Qwen 27b, a tier of task below Opus sure, but not every task requires the heaviest model. Actually looking at your profile maybe valuable for your use case

English

100

Gilbert Bagaoisan@gbertb·6d

@lupi_arsene @TeksEdge Exactly, we still have a ways to go before open source is taken seriously on actual work, not weekend tests

English

578

David Hendrickson@TeksEdge·6d

💵Home Inferencing Cost Comparison Running For 1 Day. 🏠 Personal LLM 🖥️ DGX-Spark Clone ($3K Asus) 🤖 Qwen3.5 27B @ 30 tps ⏲️ 24 hours 🪙 2.6M tokens Cost = $0.30 of electricity 🏭 BigAI 🤖 Sonnet 4.6 @ 55tps (my experience) ⏲️ 13 hours 🪙 2.6M tokens Cost = $39 in tokens

English

629

69.4K

Chris Griffin@csgriff_·6d

@DeepInfra @nvidia You are a scholar and a gentleman sir!

English

DeepInfra@DeepInfra·6d

@csgriff_ @nvidia yes we are working on adding these

English

DeepInfra@DeepInfra·11 Mar

We are excited to launch @NVIDIA Nemotron 3 Super on DeepInfra! Built for complex multi-agent applications, this open hybrid MoE model with 120B/12B active params delivers up to 5x faster inference and supports a 1M-token context window — all optimized for efficient single-GPU deployment. Available now on DeepInfra OpenAI-compatible API at $0.10 input / $0.50 output / $0.04 cached per 1M tokens.

English

3.5K

Chris Griffin@csgriff_·6d

@llm_wizard Peter actually told me that if Nvidia made a dense model quite similar to Qwen3.5 27b they would go to the top of the list

English

Chris 🇨🇦@llm_wizard·13 Mar

This is not a Super list. It's missing at least three, a Trinity you could say, of models. Olmo-st like they needed more Intellect when writing it.

Peter Wildeford🇺🇸🚀@peterwildeford

Based on the data I see, I think: - Anthropic🇺🇸/Google🇺🇸/OpenAI🇺🇸 all ~tied - Meta🇺🇸 / xAI🇺🇸 each ~7mo behind - Moonshot🇨🇳/- Deepseek🇨🇳 / zAI 🇨🇳 / Alibaba🇨🇳each ~9mo behind - Mistral🇫🇷 ~1.5 years behind - No other companies competitive

English

Keşfet

@natolambert @llm_wizard @KyleHessling1 @sudoingX @TheAhmadOsman @rohitdotmittal @garrytan @perplexity_ai