Ivan Rocha

7.8K posts

Ivan Rocha

@irr

Lisbon Katılım Mart 2009

1.5K Takip Edilen622 Takipçiler

Ivan Rocha retweetledi

Nucleus AI@withnucleusai·9h

Introducing Nucleus-Image: the first sparse Mixture-of-Experts diffusion model 17B parameters. Only 2B active. 10x more parameter-efficient than leading diffusion models. Toe-to-toe with GPT Image 1, Imagen 4, and Qwen-Image: from pure pre-training alone. No DPO. No RL. No preference tuning. Day 0 support in 🤗 Hugging Face diffusers. Fully open-source under Apache 2.0. Weights, training code, and dataset recipe - we're not holding anything back <3

English

428

27.2K

Ivan Rocha retweetledi

Vivek Galatage@vivekgalatage·16h

Amazing series on memory allocation strategies by @TheGingerBill gingerbill.org/series/memory-…

English

659

21.5K

Ivan Rocha retweetledi

송준 Jun Song@songjunkr·1d

GGUF: huggingface.co/Jiunsong/Super…

Indonesia

4.7K

Ivan Rocha retweetledi

X Freeze@XFreeze·2d

Grok-4.20 just took the #1 spot in the world for Medicine & Healthcare on Text Arena Grok is already saving lives by identifying critical conditions that human doctors miss. There have been many real cases where Grok saved lives ❤️ Grok is officially outperforming every other model on the leaderboard, completely crushing Claude Opus 4.6, Gemini 3.1 Pro, and all other competitors This is massive. Healthcare is one of the most important fields where humanity needs help, and right now, Grok does it best

English

305

539

2.7K

16.1M

Ivan Rocha retweetledi

Viacheslav Biriukov@brk0v·1d

⚡️ Zig 0.16 is out. And the new I/O model is a huge shift. • Swap implementations (threaded, evented, etc.) • Write code that looks blocking but runs async • Composable like allocators #toc-IO-as-an-Interface" target="_blank" rel="nofollow noopener">ziglang.org/download/0.16.… #zig #ziglang

English

484

45.6K

Ivan Rocha retweetledi

Md Ismail Šojal 🕷️@0x0SojalSec·3d

A fully local 26B MoE model was built for red teaming and bug hunting. Trained on elite bug reports and real evasion tactics. DPO fine-tuned for hunter mindset. Claude sees your payloads in logs; that's why BugTraceAI Apex 26B local MoE for real red teamers. - executes WAF bypasses with internal thinking blocks. - It enforces deep internal reasoning before generating any output. - Delivers production-grade WAF/EDR evasion with AES-256-GCM obfuscation. - Zero refusals, Trained on real-world elite reports and evasion techniques. Fits in 16.7GB. Runs on RTX 3060. - huggingface.co/BugTraceAI/Bug…

English

471

35.7K

Ivan Rocha@irr·2d

Learn all about Reinforcement Learning (RL) and how to train your own DeepSeek-R1 reasoning model with Unsloth using GRPO. A complete guide from beginner to advanced unsloth.ai/docs/get-start…

English

Ivan Rocha retweetledi

Sumanth@Sumanth_077·4d

Fine-tuning massive LLMs used to be painfully slow, but not anymore! 4 open source libraries that accelerate fine-tuning of Large Language Models 1. Unsloth AI • Fine-tune models like Qwen3, Llama 4, and Gemma 3 up to 2× faster with 70% less VRAM • Uses optimized Triton kernels and manual backprop for exact accuracy • Supports low-resource setups and runs on consumer GPUs or even Colab/Kaggle with ~3 GB VRAM GitHub repo → github.com/unslothai/unsl… 2. LLaMA Factory • Fine-tune over 100 models (LLaMA, Mistral, Gemma, etc.) using a simple CLI or WebUI • Supports LoRA, QLoRA, full or frozen fine-tuning across 2–8‑bit precision • Includes built-in dataset templates, training monitors, and model export options GitHub repo → github.com/hiyouga/LlamaF… 3. DeepSpeed • Built for large-scale distributed fine-tuning with ZeRO and FSDP • Optimized for multi-GPU and multi-node training with advanced memory management • Trusted in production environments for scalable LLM training GitHub repo → github.com/deepspeedai/De… 4. Axolotl • Yaml-based setup for fine-tuning, LoRA/QLoRA, DPO, GRPO, and multimodal workflows • Includes kernel optimizations for memory-efficient training • Actively maintained with support for Hugging Face, model export, and inference GitHub repo → github.com/axolotl-ai-clo…

English

168

708

32.6K

Ivan Rocha retweetledi

Bilgin Ibryam@bibryam·3d

Claude Code Skill for Generating Architecture Diagrams with Excalidraw @ooi_yee_fei/custom-claude-code-skill-auto-generating-updating-architecture-diagrams-with-excalidraw-431022f75a13" target="_blank" rel="nofollow noopener">medium.com/@ooi_yee_fei/c…

English

611

51.7K

Ivan Rocha retweetledi

Eric Johnson@edjgeek·5d

S3 Files just launched. Lambda can mount S3 buckets now. I built AI agents that share a mounted workspace, orchestrated by a durable function. Full SAM template included, every CloudFormation gotcha documented. edjgeek.com/blog/s3-files-…

English

5.1K

Ivan Rocha retweetledi

Nicolas Krassas@Dinosn·4d

Built Claude Skills for Governance, Risk, and Compliance frameworks (SO 27001, SOC 2, FedRAMP, GDPR, HIPAA, NIST CSF, PCI DSS, TSA Cybersecurity, and ISO 42001) sushegaad.github.io/Claude-Skills-…

English

112

818

91.1K

Ivan Rocha retweetledi

François Fleuret@francoisfleuret·5d

Reminder that I wrote a little book about deep-learning, which is phone-formatted, entirely free, and nearing the 1M download: fleuret.org/francois/lbdl.…

English

245

2.2K

122K

Ivan Rocha@irr·3d

A single CLAUDE.md file to improve Claude Code behavior, derived from Andrej Karpathy's observations on LLM coding pitfalls github.com/forrestchang/a…

English

Ivan Rocha@irr·3d

Gemma 4 26B-A4B MoE PRISM-PRO-Dynamic-Quant PRISM-PRO: Production model with full over-refusal and bias mechanisms completely removed using State of the Art PRISM pipeline DQ: Per-tensor-class mixed-precision allocation huggingface.co/Ex0bit/MYTHOS-…

English

118

Ivan Rocha retweetledi

Viacheslav Biriukov@brk0v·4d

🐹 Go: Calling a Rust library without CGO. Good post about internals and trade-offs: • how cgo affects portability and cross-compilation • what actually happens in FFI • where the real overhead is • why skipping cgo can buy simplicity stoolap.io/blog/2026/04/0… #golang #go

English

178

8.1K

Ivan Rocha retweetledi

Kyle Hessling@KyleHessling1·5d

Gemopus-4-26B-A4B from Jackrong is LIVE! Happy to have benched this one pretty hard (see my benches in the model card) and it is an excellent finetune of an already exceptional model! My friend Jackrong is always cooking the greatest! It rocks at one-shot requests over long contexts, and runs incredibly fast thanks to the MOE architecture while not seeming to take as much of a hit vs dense models as in the Qwen 3.5 series. It also crushed my simple needle-in-the-haystack tests all the way out to an extended context of 524k! If you're VRAM starved, or running on unified memory, this one should run much more usably offloaded to system ram or in unified memory pools; even if you're running a 10GB or less GPU! It would be my daily driver for this purpose! That said, the dense 31B Gemopus 4 is finalising now, I will post it here when it's live, so follow me for the official launch, and follow Jakcrong on Hugging Face! It will also be an incredible model! As with the base Gemma 4 models, there are some idiocycracies especially in harnesses, if you have problems, please let us know! If you make something cool with it, please comment that below, too. We'd love to see it! huggingface.co/Jackrong/Gemop…

English

145

Ivan Rocha retweetledi

Joel - coffee/acc@JoelDeTeves·5d

I'm pretty excited to test this one: Gemopus-4-26B-A4B-it-GGUF Q6_K Using @spiritbuun Llama.cpp TurboQuant fork: - Speed: 75 tokens/sec - VRAM usage: 95% (22.7 GB) - Context size: 131072 - GPU: RTX A5000 (Ampere) 24 GB Pretty amazing that you can fit this entire model on GPU with Q6 quality and still have room for a large amount of context! Plus MoE models are still fast at higher quality. Woodchuck Norris vibe check: PASSED Square root of 999999999 -> Correct Hermes Agent -> Interesting behavior. Retains 26B's speed on short prompts, thinks deeply for more complex requests - sometimes thinks a little too much, it might be worth playing with top + temp settings Coding test -> One-shotted a fully working Tetris game - no other MoE model including vanilla 26B was able to do this A very interesting model -m Gemopus-4-26B-A4B-it-Preview-Q6_K.gguf --n-gpu-layers 99 --ctx-size 131072 --cont-batching --cache-type-k turbo4 --cache-type-v turbo4 --fit on --jinja --reasoning-format auto --flash-attn on huggingface.co/Jackrong/Gemop…

English

159

7.8K

Ivan Rocha retweetledi