Diego

1.3K posts

Diego banner
Diego

Diego

@didacum333

math maniac | currently building a spanish bank regulation LLM : https://t.co/8tMzqWgiN9

Katılım Mayıs 2014
173 Takip Edilen53 Takipçiler
Sabitlenmiş Tweet
Diego
Diego@didacum333·
Imagine you have to train a robotic hand to grab a cube using Deep rl: > Start by complex multimodal nn architeture, adding layers, lstms (nerve signals), CNN (cameras), every possible buzzword. Create a first rough reward function. > Not surprisingly, hand gets cramped and doesn't learn anything > Add batch normalization layers, gradient clipping. > Better progress > Try using different rl algorithms, PPO, DDPG, SAC, changing the architecture each time. > Reduce nn complexity, adjust reward function. > Still no significant results, partial progress. > Iterate like 25 times tuning hiperparameters, changing the archite ture. > Reduce complexity of model again, simplify reward function. > Finally, this happened
English
1
1
3
372
Diego
Diego@didacum333·
@XMihura Encima es como el 37% del presupuesto. Literalmente "Otras cosas"
Español
1
0
0
875
European Commission
European Commission@EU_Commission·
We have proposed the EU budget for 2027! 💶 Almost €200 billion directed toward shared priorities: 💶 Strong and competitive economy 🇺🇦 Continued support for Ukraine 🛡️ Stronger defence 🌾 Agriculture 🏠 Affordable housing ⚡️ Energy transition 🎓🚆 Key EU programmes
European Commission tweet media
English
485
161
755
203.9K
clem 🤗
clem 🤗@ClementDelangue·
Concentration of power, capabilities and economic wealth is the biggest risk in AI. We need open science and open-source more than ever!
English
111
479
3.1K
161K
Diego
Diego@didacum333·
@AliGrids A function that large shouldnt even exists
English
0
0
0
943
Ali Grids
Ali Grids@AliGrids·
POV: Senior tells you to "just refactor line 6061" 🪦
English
57
60
1.9K
471.8K
Gamingtronium
Gamingtronium@Gamingtronium·
Hot take: the username field is completely pointless. Change my mind.
Gamingtronium tweet media
English
538
19
1.5K
380.8K
Samay
Samay@Samaytwt·
Never touching cursor again 😭
Samay tweet media
English
369
79
1.6K
256.4K
Diego
Diego@didacum333·
@analogalok Imagine burning gemma4 12B weights on a FPGA board
English
0
0
1
508
Alok
Alok@analogalok·
Run Gemma 4 26B MoE on 8GB VRAM with 250k context at 20+ tokens/sec If you own any 8GB VRAM graphics card, stop what you are doing. Local AI just had its absolute "Holy Shit" moment for budget hardware. Yesterday, I benchmarked Unsloth Gemma 4 12B Q4_K_XL on an 8GB card. The community went wild but immediately demanded more: "Can we run a 25B+ model on budget GPUs?" Today, I’m delivering exactly that. I am running a massive 26B parameter Mixture of Experts (MoE) model locally on a standard 8GB VRAM setup with 250k full native context!. If you own an RTX 3060, 3070, 4060, or any budget GPU with 8GB of VRAM, the local AI paradigm has completely changed. The performance metrics are astonishing: - 20 tokens/sec flat decode throughput. - Stable, flat decode speed even with massive prompts. - I threw a 60k token prompt at it, and it still clocked in at 20 TPS without dropping a single frame. # What about prefill? Yes, Time To First Token (TTFT) is slightly high when swallowing massive contexts. But with a solid 200 tokens/sec prefill speed, the wait is barely noticeable and highly usable. And this is running completely without Multi Token Prediction (MTP) active. How is this possible? It’s the magic of Google's new QAT (Quantization Aware Training) quants for Gemma 4. The model weight file (unsloth gemma-4-26B-A4B-it-qat-UD-Q4_K_XL.gguf) is only 13.2 GB, making it the ultimate local powerhouse. # The Test Setup: CPU: Intel Core i7 RAM: 16GB System RAM GPU: NVIDIA GeForce RTX 4060 Laptop GPU (8GB VRAM) # The Secret Sauce (The -cmoe Flag) To make this work properly on any 8GB card, you must use the -cmoe (CPU MoE) flag in llama.cpp. This flag isolates the heavy MoE expert weights directly to system memory (CPU/RAM) while letting your GPU focus strictly on the Attention layers and the KV Cache. It prevents VRAM spillage and holds the throughput rock solid. # The flags: -m "gemma-4-26B-A4B-it-qat-UD-Q4_K_XL.gguf" -cmoe -c 248000 -v Once running, just open the UI on localhost and toggle the new reasoning lightbulb icon in the text input box to watch the model perform multi step thinking. Are you still running smaller models, or are you ready to scale up your budget local setups? Let's discuss in the replies
Alok@analogalok

a new 8GB VRAM GPU dense Local LLM leader was born yesterday runs on: RTX 4060 / RTX 3070 / RTX 2080. any 8GB card Qwen 3.5 9B (dense) was the go to for 6-8GB VRAM builds. Gemma 4 12B QAT (dense) just changed that. same llama.cpp + cuda 13.2. i7 12700H. 16GB RAM. same -ngl 99 flags. same 48k context. unsloth gemma-4-12b-it-Q4_K_M.gguf → 15 tok/sec @ 48k ctx unsloth gemma-4-12B-it-qat-UD-Q4_K_XL.gguf → 32 tok/sec @ 48k ctx → 26 tok/sec @ 64k ctx 64k context is a big deal. Hermes 3 agent requires 64k minimum to run. you're now getting full hermes compatible context on a budget consumer GPU at 26 tok/sec locally. 2.1x faster on identical hardware. and here's the part that breaks your brain: the QAT-UD-Q4_K_XL is actually SMALLER than the Q4_K_M "XL" why? QAT = Quantization Aware Training Google didn't train the model first and compress it later they trained it to be quantized from day one the weights already know how to survive low precision that's why you get more quality per byte llamacpp flags: -m gemma-4-12B-it-qat-UD-Q4_K_XL.gguf -cnv -ngl 99 -c 48000 -v fits in 8GB VRAM clean. no API. no cloud. no subscription. and this isn't even the MTP variant yet Gemma-4-E2B QAT runs on 3GB RAM, E4B on 5GB, 12B on 7GB, 26-A4B on 15GB and 31B on 18GB. I have benchmarked the 26b and 31b qat as well on a single RTX 4090, checkout the comments for details. If you have a 6GB or 8GB VRAM GPU, post your numbers. more benchmarks and configs coming soon

English
77
178
1.7K
287K
Diego
Diego@didacum333·
@alvarobartt Maybe trying your luck on wallapop. Prices are crazy nowadays
English
1
0
0
14
Alvaro Bartolome
Alvaro Bartolome@alvarobartt·
It might be time to get my hands on a NVIDIA GPU myself, but pricing is crazy, not sure what to do, any recommendations?
English
2
0
0
236
Diego
Diego@didacum333·
Conciusness is the subconcious dream , the subconcious hallucinating a virtual RTOS. I believe It could change significantly in case It has access to immense computing power. Your thinking of yourself would evolve rapidly, until you are no longer recognizable for what you once were
English
0
0
0
10
Diego
Diego@didacum333·
Yes, exactly. You could start with expanding memory packs for people with memory problems, that the brain could store and retrieve information. Then, if you apply this to the rest of the brain, you could even let the brain modify itself in order to acomódate for the new capacity, until the old brain was almost vestigial. Who knows, maybe your whole personality would change, since you Will be 100x faster and capable.
English
0
0
0
6
Diego
Diego@didacum333·
Amazing what 8h of sleep do to the human mind
English
0
0
0
7
Diego
Diego@didacum333·
@yacineMTB But you could increase the friction to get a succesful policy and then decrease It little by little
English
0
0
0
12
kache
kache@yacineMTB·
I refuse to make the task easier. I could increase friction or reduce gravity but that feels like giving up
English
15
0
36
4.3K
kache
kache@yacineMTB·
still haven't gotten 6 pendulums but recently just got a top scorer
English
12
0
85
7.8K
kache
kache@yacineMTB·
still haven't solved cartpole 6. i'm going to repeatedly bash my head through the drywall
English
36
0
162
12K
Diego
Diego@didacum333·
@_nasch_ Cuanta VRAM tienes? O es en RAM?
Español
0
0
0
360
Nicolás Schürmann
Nicolás Schürmann@_nasch_·
190 tokens por segundo con Qwen3.6 35B en hardware local. A esta velocidad todo se siente instantáneo. Los que tienen suscripción de Anthropic o de OpenAI es porque quieren.
Español
256
172
3K
416.4K
Diego
Diego@didacum333·
@Strife212 You could also solve step by step by hand all the molecules from your body
English
0
0
0
1
Strife
Strife@Strife212·
LLMs are intelligent, but they clearly aren't conscious? You could (incredibly slowly) run an LLM by hand by doing the matrix calculations with a pen and paper, would that be conscious? There is zero difference between doing that and doing it on a GPU except its faster.
English
824
275
9.2K
506.4K
Diego
Diego@didacum333·
@SorensonCorben @yacineMTB That could be interesting, increase the damping. I think one Big issue with this env could be the action steps, as you add joints, you should need snaller reaction times in order to stabilize It, since it's a much more unstable env
English
1
0
2
24
corben sorenson
corben sorenson@SorensonCorben·
@yacineMTB Is there any resistance programmed into the joints? Perhaps even having a different resistance per joint. Then after it starts nailing it back off on the resistance and see how it does until you are back to ‘normal’.
English
2
0
1
152
Diego
Diego@didacum333·
@louis030195 This question is going to get harder in the future, but perhaps that is good news
English
0
0
0
48
Diego
Diego@didacum333·
@Gonzalo_stba Ojalá hubiese más personas como tú en España. Estás recibiendo por todas partes solo por estar interesado en hacer cosas chulas
Español
0
0
0
25
Gonzalo
Gonzalo@Gonzalo_stba·
Pues primer día de prácticas finalizado y qué decir, estoy alucinado He estado de 9.00 a 18.30 y se me ha pasado el día volando, he aprendido más en un día que en los últimos 4 años. Mis jefes son increíbles y unos obsesos de la IA como yo, me dan libertad y me animan a desarrollar procesos y herramientas para cualquier cosa que se me ocurra Me ha dado pena tener que irme a las 18.30 para estudiar los exámenes, pero bueno, es lo que toca. En cuanto acabe exámenes no dudo que estaré currando de 9.00 a 20-21 (como ellos) porque es algo interesantísimo y el ambiente es perfecto Además puedo ir andando a la ofi, tardo 30 min. He tenido una suerte increíble con esto, soy un privilegiado de poder trabajar un par de meses con ellos, ojalá durasen más.
Español
94
3
311
732.8K
Diego
Diego@didacum333·
@yacineMTB What if you increase damping on the joints to learn a policy, then slowly decrease It again?
English
0
0
0
62
kache
kache@yacineMTB·
well it's getting one of them up..
English
22
1
114
11.7K
Diego
Diego@didacum333·
There are two possible outcomes from this experiment: 1. There exists a policy st It stabilizes the 5 pendulums in the position 2. It is physically imposible due to a severe instability problem. Perhaps It IS the very action steps definition. Perhaps It is needed to reduce the time between actions to the maximum? Anyways, It could be cool seeing It work
English
0
0
1
111