The_tale_May_told...

2K posts

The_tale_May_told...

@_2nji_

M.S. NLP, Technical artist - writer @actvt_io @comicverse_ai

Paris, France Katılım Ağustos 2012

616 Takip Edilen423 Takipçiler

Sabitlenmiş Tweet

The_tale_May_told...@_2nji_·16 Eyl

I found myself constantly searching for activity monitor on spotlight multiple times a day, most times to get an idea of my CPU, GPU or memory load. I built actvt to highlight these information and as a bonus securely do the same for remote servers i juggle at work.

ACTVT@actvt_io

Actvt: Unified activity monitoring for CPU, GPU, Memory on Remote Servers & MacOs producthunt.com/products/actvt… via @producthunt

English

563

The_tale_May_told...@_2nji_·7h

@Foorack @vonderleyen Thank you!

English

Foorack / 風楽 🇪🇺@Foorack·8h

@_2nji_ @vonderleyen On GitHub github.com/eu-digital-ide…

English

Ursula von der Leyen@vonderleyen·11h

It is for parents to raise their children. Not platforms. The European Age Verification App is ready ↓ twitter.com/i/broadcasts/1…

English

3.2K

412

1.6K

The_tale_May_told...@_2nji_·9h

@vonderleyen Where's the source code?

English

Ursula von der Leyen@vonderleyen·11h

Our app ticks all the boxes. ✅ Highest privacy standards in the world ✅ Works on any device ✅ Easy to use ✅ Fully open source

English

531

113

554

200.4K

The_tale_May_told...@_2nji_·5d

@SpaceTimeViking @TheAhmadOsman @heydave7 Interesting stuff, will take a look at Dflash. Thanks

English

ÆON FORGE ✨@SpaceTimeViking·5d

@_2nji_ @TheAhmadOsman @heydave7 They do at least with Qwen 100% but the Gemma 4 ones it’s harder to tell. That said 50 Tok/s MoE vs 13 Tok/s Dense Makes the selection more complicated on something like a DGX Spark.

English

Dave Lee@heydave7·5d

This afternoon I picked up a new Nvidia DGX Spark computer with the goal of trying to run Gemma 4 31b (4bit) on it locally as a server. Just 1.5 hours later, it’s working! Using Open WebUI on my MacBook as the interface and it’s connecting to my DGX Spark running as a Gemma 4 server.

English

964

129.4K

The_tale_May_told...@_2nji_·5d

@SpaceTimeViking @TheAhmadOsman @heydave7 In my experience, the dense models (Qwen and Gemma) have a certain oomph over their MOE counterparts, I use Claude code for agentic stuff, my major local llm use cases are web search and llm server for development, so the Dense latency is fine for me.

English

ÆON FORGE ✨@SpaceTimeViking·5d

@_2nji_ @TheAhmadOsman @heydave7 Depends on what you are doing with it. I am sure he’s planing to do some agentic work with it. The Gemma 26B A4B IT ranks pretty close to the 31B dense but exponentially faster. I also quantized it to NVFP4 for the Blackwell hardware support. huggingface.co/AEON-7/Gemma-4…

English

The_tale_May_told...@_2nji_·5d

@TheAhmadOsman @heydave7 For a single user, running dense models are also fine.

English

137

Ahmad@TheAhmadOsman·5d

@heydave7 You're using the wrong model on this machine x.com/TheAhmadOsman/…

Ahmad@TheAhmadOsman

Only MoEs should be used on DGX Sparks Unified memory is bandwidth constrained, MoEs help a lot because only a small subset of parameters is processed per token In practice, MoEs are the difference between triple digit tokens/sec under concurrent load & single digit tokens/sec

English

8.4K

The_tale_May_told...@_2nji_·6d

One of my favourite usecases of my local LLM is having extremely weird discussions.

English

The_tale_May_told...@_2nji_·7 Nis

@mariofilhoml Highly Agree.

English

Mario Filho@mariofilhoml·7 Nis

it's gpt-2 FUD all over again

Anthropic@AnthropicAI

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing

English

184

19.1K

The_tale_May_told...@_2nji_·7 Nis

@elder_plinius Doesn’t look ‘Gatekeep’ worthy to me.

English

810

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius·7 Nis

CLAUDE MYTHOS EVALS 🤯

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭 tweet media

Anthropic@AnthropicAI

The Claude Mythos Preview system card is available here: anthropic.com/claude-mythos-…

English

1.3K

124.6K

The_tale_May_told...@_2nji_·7 Nis

@spark_arena Very comprehensive, thanks. Do you have one for llama.cpp?

English

sparkarena@spark_arena·7 Nis

Check out how nvidia/Gemma-4-31B-IT-NVFP4 achieved 10.96 tokens/sec on text generation on NVIDIA DGX Spark with vllm! View full benchmark at spark-arena.com/benchmark/sub1…

English

478

The_tale_May_told...@_2nji_·7 Nis

I think they have niches they are good at, your task was more suited to GPT 5.4, reminds me of using both Opus 4.6 and GPT 5.4 to do a CTF task, Opus found about 5 candidates and GPT 5.4 about 3, Both had a first option that was very plausible and did not appear on the others list and all their other options were slop.

English

664

antirez@antirez·7 Nis

The difference between the two, for serious engineering work, is simply brutal. Claude Code with Opus is, when the task at hand is very complicated, borderline useless, while GPT 5.4 can do a reverse engineering mixing: hardware knowledges, major disassembly skills, and so on.

English

794

108.7K

antirez@antirez·7 Nis

During the last week I executed very long autonomous sessions of Claude Code Opus 4.6 and Codex GPT 5.4 (both at max thinking budget), in cloned directories (refreshed every time one was behind). I burned a lot of (flat rate, my OSS free account + my PRO account) of tokens...

English

178

2.2K

912.2K

The_tale_May_told...@_2nji_·6 Nis

@bnjmn_marie Okay, Gave you a follow, I’ve been looking for a comprehensive eval.

The_tale_May_told...@_2nji_

What's the vibe for Gemma 4 31B, i haven't found any comprehensive benchmark comparison to Qwen 3.5 27B. also seems to be broken on LM Studio.

English

Benjamin Marie@bnjmn_marie·6 Nis

@_2nji_ No. It'll likely take a week

English

519

Benjamin Marie@bnjmn_marie·6 Nis

List of quantized Gemma 4 31B I’m evaluating: - Intel/gemma-4-31B-it-int4-AutoRound (19.2 GB) - cyankiwi/gemma-4-31B-it-AWQ-4bit (20.5 GB) - RedHatAI/gemma-4-31B-it-NVFP4 (23.3 GB) - nvidia/Gemma-4-31B-IT-NVFP4 (32.7 GB) - RedHatAI/gemma-4-31B-it-FP8-block (33.3 GB) → yes, NVIDIA’s NVFP4 checkpoint is as large as an FP8 checkpoint. This is what happens when you don’t quantize the attention layers of a dense model.

English

183

15.7K

The_tale_May_told...@_2nji_·6 Nis

Even if we never have AGI, these transformer based models are very complimentary to human intelligence, it fills the gap of memory storage with near perfect retrieval, and to have that at the scale of the internet, That’s the real win.

English

The_tale_May_told...@_2nji_·6 Nis

@fishright Haha, I absolutely love Qwen3.5 27B so I’m not in a hurry to take the extra effort it requires to use Gemma 4 31B.

English

Mike McQuade@fishright·6 Nis

@_2nji_ I think it’s better on Mac, it spends less time thinking.

English

The_tale_May_told...@_2nji_·5 Nis

What's the vibe for Gemma 4 31B, i haven't found any comprehensive benchmark comparison to Qwen 3.5 27B. also seems to be broken on LM Studio.

English

419

The_tale_May_told...@_2nji_·5 Nis

@AndrewMayne @ClementDelangue @NaveenGRao Alt Alt take: Because of the compute constraints APIs become stupidly expensive (High Demand & low Supply) compared with the same amount of tokens on their direct products and still makes it scary and uneconomical to compete.

English

Andrew Mayne@AndrewMayne·4 Nis

Alt take: Both OpenAI and Anthropic have mega-sized deals with AWS, Azure, etc to serve their models via API because these are such huge money earners for them. These are contractual obligations going into the next decade. APIs will continue to serve the majority of tokens. While Anthropic is struggling with capacity issues, their API business is huge and a major part of their revenue. I don't think they enjoy limiting API access right now and wish they were in OpenAI's shoes regarding capacity. But they're very smart and have lots of money. They'll figure it out. Meanwhile, OpenAI has even talked about eventually renting capacity in Stargate to other providers.

English

clem 🤗@ClementDelangue·4 Nis

I think it’s @NaveenGRao who said it before but wouldn’t be surprised if the frontier labs cut their APIs entirely at some point. In a compute constrained world, they’ll always prioritize their own direct products/customers. Makes it scary and unsustainable to only build on top of their APIs!

English

496

77.4K

The_tale_May_told...@_2nji_·5 Nis

@TheAhmadOsman I use Qwen 3.5 27B with llama.cpp on DGX Spark, average wait time is decent and I think there’s room to optimize my setup.

English

257

Ahmad@TheAhmadOsman·4 Nis

Which model to use locally with Hermes agent? on Unified Memory Hardware* > Gemma 4 26B-A4B on GPUs > Qwen 3.5 27B * Mac Studio, DGX Spark, MacBook, etc

English

422

40.2K

The_tale_May_told...@_2nji_·31 Mar

@badlogicgames @0xSero Yeah, hilarious cos they’re literally begging for market share from Claude code.

English

180

Mario Zechner@badlogicgames·30 Mar

ok, this is hilarious

Romain Huet@romainhuet

We’ve seen Claude Code users bring in Codex for code review and use GPT-5.4 for more complex tasks, so we thought: why not make that easier? Today we’re open sourcing a plugin for it! You can call Codex from Claude Code with your ChatGPT subscription. We love an open ecosystem!

English

361

71.6K

The_tale_May_told...@_2nji_·29 Mar

@menhguin I have an agent skill that uses rsync and /tmp to transfer files across devices in my tailnet, feels like one unified file system.

English

Minh Nhat Nguyen@menhguin·29 Mar

imo for primary device local RAM (above like, 24GB) is overkill, local storage is kinda helpful bc agentic file transfers back and forth can be a bit annoying. connectivity, compatibility and ease/longevity of always-on more important.

Lotto@LottoLabs

Spend your money on desktop hardware and ssh into it Don’t spend 5k on a MacBook

English

3.6K

The_tale_May_told...@_2nji_·28 Mar

@TheAhmadOsman Still stuck on the waitlist

English

Ahmad@TheAhmadOsman·28 Mar

I am happy to announce 2 new additions to the x/LocalLLaMA community mod team - 0xSero - sudoingX Hoping this helps streamline growth of the local inference community and deliver grounded, real content over hype Find the community spotlighted at the top of our profiles to join

English

458

26.6K

Keşfet

@Foorack @vonderleyen @SpaceTimeViking @TheAhmadOsman @heydave7 @mariofilhoml @elder_plinius @spark_arena