Diego

1.3K posts

Diego

@didacum333

math maniac | currently building a spanish bank regulation LLM : https://t.co/8tMzqWgiN9

Katılım Mayıs 2014

173 Takip Edilen53 Takipçiler

Sabitlenmiş Tweet

Diego@didacum333·7 Oca

Imagine you have to train a robotic hand to grab a cube using Deep rl: > Start by complex multimodal nn architeture, adding layers, lstms (nerve signals), CNN (cameras), every possible buzzword. Create a first rough reward function. > Not surprisingly, hand gets cramped and doesn't learn anything > Add batch normalization layers, gradient clipping. > Better progress > Try using different rl algorithms, PPO, DDPG, SAC, changing the architecture each time. > Reduce nn complexity, adjust reward function. > Still no significant results, partial progress. > Iterate like 25 times tuning hiperparameters, changing the archite ture. > Reduce complexity of model again, simplify reward function. > Finally, this happened

English

372

Diego@didacum333·1d

@XMihura Encima es como el 37% del presupuesto. Literalmente "Otras cosas"

Español

875

Mihura@XMihura·2d

Cohesion, Resilience and Values €75.8 billion

European Commission@EU_Commission

We have proposed the EU budget for 2027! 💶 Almost €200 billion directed toward shared priorities: 💶 Strong and competitive economy 🇺🇦 Continued support for Ukraine 🛡️ Stronger defence 🌾 Agriculture 🏠 Affordable housing ⚡️ Energy transition 🎓🚆 Key EU programmes

English

639

62.8K

Diego@didacum333·1d

@EU_Commission We need data centers

English

European Commission@EU_Commission·4d

English

485

161

755

203.9K

Diego@didacum333·3d

@ylecun @ClementDelangue @Dan_Jeffries1 Is this the linux for llms?

English

Yann LeCun@ylecun·5d

@ClementDelangue @Dan_Jeffries1 Everyone, please join Project Tapestry thealliance.ai/projects/tapes…

English

164

1.1K

431.9K

clem 🤗@ClementDelangue·5d

Concentration of power, capabilities and economic wealth is the biggest risk in AI. We need open science and open-source more than ever!

English

111

479

3.1K

161K

Diego@didacum333·3d

@AliGrids A function that large shouldnt even exists

English

943

Ali Grids@AliGrids·4d

POV: Senior tells you to "just refactor line 6061" 🪦

English

1.9K

471.8K

Diego@didacum333·3d

@Gamingtronium What if you forget your passwrod?

English

Gamingtronium@Gamingtronium·4d

Hot take: the username field is completely pointless. Change my mind.

English

538

1.5K

380.8K

Diego@didacum333·7 Haz

@Samaytwt He apologized tho

English

Samay@Samaytwt·7 Haz

Never touching cursor again 😭

English

369

1.6K

256.4K

Diego@didacum333·7 Haz

@analogalok Imagine burning gemma4 12B weights on a FPGA board

English

508

Alok@analogalok·7 Haz

Run Gemma 4 26B MoE on 8GB VRAM with 250k context at 20+ tokens/sec If you own any 8GB VRAM graphics card, stop what you are doing. Local AI just had its absolute "Holy Shit" moment for budget hardware. Yesterday, I benchmarked Unsloth Gemma 4 12B Q4_K_XL on an 8GB card. The community went wild but immediately demanded more: "Can we run a 25B+ model on budget GPUs?" Today, I’m delivering exactly that. I am running a massive 26B parameter Mixture of Experts (MoE) model locally on a standard 8GB VRAM setup with 250k full native context!. If you own an RTX 3060, 3070, 4060, or any budget GPU with 8GB of VRAM, the local AI paradigm has completely changed. The performance metrics are astonishing: - 20 tokens/sec flat decode throughput. - Stable, flat decode speed even with massive prompts. - I threw a 60k token prompt at it, and it still clocked in at 20 TPS without dropping a single frame. # What about prefill? Yes, Time To First Token (TTFT) is slightly high when swallowing massive contexts. But with a solid 200 tokens/sec prefill speed, the wait is barely noticeable and highly usable. And this is running completely without Multi Token Prediction (MTP) active. How is this possible? It’s the magic of Google's new QAT (Quantization Aware Training) quants for Gemma 4. The model weight file (unsloth gemma-4-26B-A4B-it-qat-UD-Q4_K_XL.gguf) is only 13.2 GB, making it the ultimate local powerhouse. # The Test Setup: CPU: Intel Core i7 RAM: 16GB System RAM GPU: NVIDIA GeForce RTX 4060 Laptop GPU (8GB VRAM) # The Secret Sauce (The -cmoe Flag) To make this work properly on any 8GB card, you must use the -cmoe (CPU MoE) flag in llama.cpp. This flag isolates the heavy MoE expert weights directly to system memory (CPU/RAM) while letting your GPU focus strictly on the Attention layers and the KV Cache. It prevents VRAM spillage and holds the throughput rock solid. # The flags: -m "gemma-4-26B-A4B-it-qat-UD-Q4_K_XL.gguf" -cmoe -c 248000 -v Once running, just open the UI on localhost and toggle the new reasoning lightbulb icon in the text input box to watch the model perform multi step thinking. Are you still running smaller models, or are you ready to scale up your budget local setups? Let's discuss in the replies

Alok@analogalok

a new 8GB VRAM GPU dense Local LLM leader was born yesterday runs on: RTX 4060 / RTX 3070 / RTX 2080. any 8GB card Qwen 3.5 9B (dense) was the go to for 6-8GB VRAM builds. Gemma 4 12B QAT (dense) just changed that. same llama.cpp + cuda 13.2. i7 12700H. 16GB RAM. same -ngl 99 flags. same 48k context. unsloth gemma-4-12b-it-Q4_K_M.gguf → 15 tok/sec @ 48k ctx unsloth gemma-4-12B-it-qat-UD-Q4_K_XL.gguf → 32 tok/sec @ 48k ctx → 26 tok/sec @ 64k ctx 64k context is a big deal. Hermes 3 agent requires 64k minimum to run. you're now getting full hermes compatible context on a budget consumer GPU at 26 tok/sec locally. 2.1x faster on identical hardware. and here's the part that breaks your brain: the QAT-UD-Q4_K_XL is actually SMALLER than the Q4_K_M "XL" why? QAT = Quantization Aware Training Google didn't train the model first and compress it later they trained it to be quantized from day one the weights already know how to survive low precision that's why you get more quality per byte llamacpp flags: -m gemma-4-12B-it-qat-UD-Q4_K_XL.gguf -cnv -ngl 99 -c 48000 -v fits in 8GB VRAM clean. no API. no cloud. no subscription. and this isn't even the MTP variant yet Gemma-4-E2B QAT runs on 3GB RAM, E4B on 5GB, 12B on 7GB, 26-A4B on 15GB and 31B on 18GB. I have benchmarked the 26b and 31b qat as well on a single RTX 4090, checkout the comments for details. If you have a 6GB or 8GB VRAM GPU, post your numbers. more benchmarks and configs coming soon

English

178

1.7K

287K

Diego@didacum333·7 Haz

@alvarobartt Maybe trying your luck on wallapop. Prices are crazy nowadays

English

Alvaro Bartolome@alvarobartt·5 Haz

It might be time to get my hands on a NVIDIA GPU myself, but pricing is crazy, not sure what to do, any recommendations?

English

236

Diego@didacum333·7 Haz

Conciusness is the subconcious dream , the subconcious hallucinating a virtual RTOS. I believe It could change significantly in case It has access to immense computing power. Your thinking of yourself would evolve rapidly, until you are no longer recognizable for what you once were

English

François Fleuret@francoisfleuret·7 Haz

Because it seems unreasonable that if you were replacing pieces of one's brain with electronic parts functionally identical their consciousness would fade out.

Tim is GOING TO VIBECAMP ⛺️@MasterTimBlais

"consciousness is substrate independent" HOW DO YOU KNOWWWW people just out here claiming shit

English

276

42.7K

Diego@didacum333·7 Haz

Yes, exactly. You could start with expanding memory packs for people with memory problems, that the brain could store and retrieve information. Then, if you apply this to the rest of the brain, you could even let the brain modify itself in order to acomódate for the new capacity, until the old brain was almost vestigial. Who knows, maybe your whole personality would change, since you Will be 100x faster and capable.

English

Diego@didacum333·7 Haz

Amazing what 8h of sleep do to the human mind

English

Diego@didacum333·6 Haz

@yacineMTB But you could increase the friction to get a succesful policy and then decrease It little by little

English

kache@yacineMTB·6 Haz

I refuse to make the task easier. I could increase friction or reduce gravity but that feels like giving up

English

4.3K

kache@yacineMTB·6 Haz

still haven't gotten 6 pendulums but recently just got a top scorer

English

7.8K

Diego@didacum333·5 Haz

@_wallfacer @SorensonCorben @yacineMTB Precisely. Near the upward equlibrium there is also a guaranteed controller by kalman, as the linearization is controllable.

English

Wallfacer ⬛⬛⬛⬛⬛@_wallfacer·5 Haz

@didacum333 @SorensonCorben @yacineMTB Higher frequency resonant modes not being reached with the smooth action steps.

English

kache@yacineMTB·5 Haz

still haven't solved cartpole 6. i'm going to repeatedly bash my head through the drywall

English

162

12K

Diego@didacum333·5 Haz

@_nasch_ Cuanta VRAM tienes? O es en RAM?

Español

360

Nicolás Schürmann@_nasch_·5 Haz

190 tokens por segundo con Qwen3.6 35B en hardware local. A esta velocidad todo se siente instantáneo. Los que tienen suscripción de Anthropic o de OpenAI es porque quieren.

Español

256

172

416.4K

Diego@didacum333·5 Haz

@Strife212 You could also solve step by step by hand all the molecules from your body

English

Strife@Strife212·5 Haz

LLMs are intelligent, but they clearly aren't conscious? You could (incredibly slowly) run an LLM by hand by doing the matrix calculations with a pen and paper, would that be conscious? There is zero difference between doing that and doing it on a GPU except its faster.

English

824

275

9.2K

506.4K

Diego@didacum333·5 Haz

@SorensonCorben @yacineMTB That could be interesting, increase the damping. I think one Big issue with this env could be the action steps, as you add joints, you should need snaller reaction times in order to stabilize It, since it's a much more unstable env

English

corben sorenson@SorensonCorben·5 Haz

@yacineMTB Is there any resistance programmed into the joints? Perhaps even having a different resistance per joint. Then after it starts nailing it back off on the resistance and see how it does until you are back to ‘normal’.

English

152

Diego@didacum333·5 Haz

@louis030195 This question is going to get harder in the future, but perhaps that is good news

English

louis030195 | screenpipe (YC S26)@louis030195·5 Haz

probably the most important job question every company should ask in 2026 🥲

louis030195 | screenpipe (YC S26) tweet media

English

958

Diego@didacum333·4 Haz

@Gonzalo_stba Ojalá hubiese más personas como tú en España. Estás recibiendo por todas partes solo por estar interesado en hacer cosas chulas

Español

Gonzalo@Gonzalo_stba·1 Haz

Pues primer día de prácticas finalizado y qué decir, estoy alucinado He estado de 9.00 a 18.30 y se me ha pasado el día volando, he aprendido más en un día que en los últimos 4 años. Mis jefes son increíbles y unos obsesos de la IA como yo, me dan libertad y me animan a desarrollar procesos y herramientas para cualquier cosa que se me ocurra Me ha dado pena tener que irme a las 18.30 para estudiar los exámenes, pero bueno, es lo que toca. En cuanto acabe exámenes no dudo que estaré currando de 9.00 a 20-21 (como ellos) porque es algo interesantísimo y el ambiente es perfecto Además puedo ir andando a la ofi, tardo 30 min. He tenido una suerte increíble con esto, soy un privilegiado de poder trabajar un par de meses con ellos, ojalá durasen más.

Español

311

732.8K

Diego@didacum333·4 Haz

@yacineMTB What if you increase damping on the joints to learn a policy, then slowly decrease It again?

English

kache@yacineMTB·3 Haz

well it's getting one of them up..

English

114

11.7K

Diego@didacum333·4 Haz

There are two possible outcomes from this experiment: 1. There exists a policy st It stabilizes the 5 pendulums in the position 2. It is physically imposible due to a severe instability problem. Perhaps It IS the very action steps definition. Perhaps It is needed to reduce the time between actions to the maximum? Anyways, It could be cool seeing It work

English

111

kache@yacineMTB·3 Haz

2023, 3 pendulums 2025, 4 pendulums 2026:

Matt Timmermans@matt_timmermans

@yacineMTB I think 4 is impossible.

Indonesia

468

87.2K

Keşfet

@XMihura @EU_Commission @ylecun @ClementDelangue @Dan_Jeffries1 @AliGrids @Gamingtronium @Samaytwt