Ivan Rocha

7.8K posts

Ivan Rocha banner
Ivan Rocha

Ivan Rocha

@irr

Lisbon Katılım Mart 2009
1.5K Takip Edilen622 Takipçiler
Ivan Rocha retweetledi
Nucleus AI
Nucleus AI@withnucleusai·
Introducing Nucleus-Image: the first sparse Mixture-of-Experts diffusion model 17B parameters. Only 2B active. 10x more parameter-efficient than leading diffusion models. Toe-to-toe with GPT Image 1, Imagen 4, and Qwen-Image: from pure pre-training alone. No DPO. No RL. No preference tuning. Day 0 support in 🤗 Hugging Face diffusers. Fully open-source under Apache 2.0. Weights, training code, and dataset recipe - we're not holding anything back <3
Nucleus AI tweet mediaNucleus AI tweet mediaNucleus AI tweet media
English
20
60
428
27.2K
Ivan Rocha retweetledi
X Freeze
X Freeze@XFreeze·
Grok-4.20 just took the #1 spot in the world for Medicine & Healthcare on Text Arena Grok is already saving lives by identifying critical conditions that human doctors miss. There have been many real cases where Grok saved lives ❤️ Grok is officially outperforming every other model on the leaderboard, completely crushing Claude Opus 4.6, Gemini 3.1 Pro, and all other competitors This is massive. Healthcare is one of the most important fields where humanity needs help, and right now, Grok does it best
X Freeze tweet media
English
305
539
2.7K
16.1M
Ivan Rocha retweetledi
Viacheslav Biriukov
⚡️ Zig 0.16 is out. And the new I/O model is a huge shift. • Swap implementations (threaded, evented, etc.) • Write code that looks blocking but runs async • Composable like allocators #toc-IO-as-an-Interface" target="_blank" rel="nofollow noopener">ziglang.org/download/0.16.… #zig #ziglang
Viacheslav Biriukov tweet media
English
8
67
484
45.6K
Ivan Rocha retweetledi
Md Ismail Šojal 🕷️
A fully local 26B MoE model was built for red teaming and bug hunting. Trained on elite bug reports and real evasion tactics. DPO fine-tuned for hunter mindset. Claude sees your payloads in logs; that's why BugTraceAI Apex 26B local MoE for real red teamers. - executes WAF bypasses with internal thinking blocks. - It enforces deep internal reasoning before generating any output. - Delivers production-grade WAF/EDR evasion with AES-256-GCM obfuscation. - Zero refusals, Trained on real-world elite reports and evasion techniques. Fits in 16.7GB. Runs on RTX 3060. - huggingface.co/BugTraceAI/Bug…
Md Ismail Šojal 🕷️ tweet mediaMd Ismail Šojal 🕷️ tweet media
English
6
88
471
35.7K
Ivan Rocha
Ivan Rocha@irr·
Learn all about Reinforcement Learning (RL) and how to train your own DeepSeek-R1 reasoning model with Unsloth using GRPO. A complete guide from beginner to advanced unsloth.ai/docs/get-start…
English
0
0
2
72
Ivan Rocha retweetledi
Sumanth
Sumanth@Sumanth_077·
Fine-tuning massive LLMs used to be painfully slow, but not anymore! 4 open source libraries that accelerate fine-tuning of Large Language Models 1. Unsloth AI • Fine-tune models like Qwen3, Llama 4, and Gemma 3 up to 2× faster with 70% less VRAM • Uses optimized Triton kernels and manual backprop for exact accuracy • Supports low-resource setups and runs on consumer GPUs or even Colab/Kaggle with ~3 GB VRAM GitHub repo → github.com/unslothai/unsl… 2. LLaMA Factory • Fine-tune over 100 models (LLaMA, Mistral, Gemma, etc.) using a simple CLI or WebUI • Supports LoRA, QLoRA, full or frozen fine-tuning across 2–8‑bit precision • Includes built-in dataset templates, training monitors, and model export options GitHub repo → github.com/hiyouga/LlamaF… 3. DeepSpeed • Built for large-scale distributed fine-tuning with ZeRO and FSDP • Optimized for multi-GPU and multi-node training with advanced memory management • Trusted in production environments for scalable LLM training GitHub repo → github.com/deepspeedai/De… 4. Axolotl • Yaml-based setup for fine-tuning, LoRA/QLoRA, DPO, GRPO, and multimodal workflows • Includes kernel optimizations for memory-efficient training • Actively maintained with support for Hugging Face, model export, and inference GitHub repo → github.com/axolotl-ai-clo…
Sumanth tweet media
English
12
168
708
32.6K
Ivan Rocha retweetledi
Bilgin Ibryam
Bilgin Ibryam@bibryam·
Claude Code Skill for Generating Architecture Diagrams with Excalidraw @ooi_yee_fei/custom-claude-code-skill-auto-generating-updating-architecture-diagrams-with-excalidraw-431022f75a13" target="_blank" rel="nofollow noopener">medium.com/@ooi_yee_fei/c…
English
7
91
611
51.7K
Ivan Rocha retweetledi
Eric Johnson
Eric Johnson@edjgeek·
S3 Files just launched. Lambda can mount S3 buckets now. I built AI agents that share a mounted workspace, orchestrated by a durable function. Full SAM template included, every CloudFormation gotcha documented. edjgeek.com/blog/s3-files-…
English
3
8
68
5.1K
Ivan Rocha retweetledi
Nicolas Krassas
Nicolas Krassas@Dinosn·
Built Claude Skills for Governance, Risk, and Compliance frameworks (SO 27001, SOC 2, FedRAMP, GDPR, HIPAA, NIST CSF, PCI DSS, TSA Cybersecurity, and ISO 42001) sushegaad.github.io/Claude-Skills-…
English
11
112
818
91.1K
Ivan Rocha retweetledi
François Fleuret
François Fleuret@francoisfleuret·
Reminder that I wrote a little book about deep-learning, which is phone-formatted, entirely free, and nearing the 1M download: fleuret.org/francois/lbdl.…
François Fleuret tweet media
English
34
245
2.2K
122K
Ivan Rocha
Ivan Rocha@irr·
Gemma 4 26B-A4B MoE PRISM-PRO-Dynamic-Quant PRISM-PRO: Production model with full over-refusal and bias mechanisms completely removed using State of the Art PRISM pipeline DQ: Per-tensor-class mixed-precision allocation huggingface.co/Ex0bit/MYTHOS-…
English
0
0
1
118
Ivan Rocha retweetledi
Viacheslav Biriukov
🐹 Go: Calling a Rust library without CGO. Good post about internals and trade-offs: • how cgo affects portability and cross-compilation • what actually happens in FFI • where the real overhead is • why skipping cgo can buy simplicity stoolap.io/blog/2026/04/0… #golang #go
Viacheslav Biriukov tweet media
English
2
19
178
8.1K
Ivan Rocha retweetledi
Kyle Hessling
Kyle Hessling@KyleHessling1·
Gemopus-4-26B-A4B from Jackrong is LIVE! Happy to have benched this one pretty hard (see my benches in the model card) and it is an excellent finetune of an already exceptional model! My friend Jackrong is always cooking the greatest! It rocks at one-shot requests over long contexts, and runs incredibly fast thanks to the MOE architecture while not seeming to take as much of a hit vs dense models as in the Qwen 3.5 series. It also crushed my simple needle-in-the-haystack tests all the way out to an extended context of 524k! If you're VRAM starved, or running on unified memory, this one should run much more usably offloaded to system ram or in unified memory pools; even if you're running a 10GB or less GPU! It would be my daily driver for this purpose! That said, the dense 31B Gemopus 4 is finalising now, I will post it here when it's live, so follow me for the official launch, and follow Jakcrong on Hugging Face! It will also be an incredible model! As with the base Gemma 4 models, there are some idiocycracies especially in harnesses, if you have problems, please let us know! If you make something cool with it, please comment that below, too. We'd love to see it! huggingface.co/Jackrong/Gemop…
English
10
9
145
8K
Ivan Rocha retweetledi
Joel - coffee/acc
Joel - coffee/acc@JoelDeTeves·
I'm pretty excited to test this one: Gemopus-4-26B-A4B-it-GGUF Q6_K Using @spiritbuun Llama.cpp TurboQuant fork: - Speed: 75 tokens/sec - VRAM usage: 95% (22.7 GB) - Context size: 131072 - GPU: RTX A5000 (Ampere) 24 GB Pretty amazing that you can fit this entire model on GPU with Q6 quality and still have room for a large amount of context! Plus MoE models are still fast at higher quality. Woodchuck Norris vibe check: PASSED Square root of 999999999 -> Correct Hermes Agent -> Interesting behavior. Retains 26B's speed on short prompts, thinks deeply for more complex requests - sometimes thinks a little too much, it might be worth playing with top + temp settings Coding test -> One-shotted a fully working Tetris game - no other MoE model including vanilla 26B was able to do this A very interesting model -m Gemopus-4-26B-A4B-it-Preview-Q6_K.gguf --n-gpu-layers 99 --ctx-size 131072 --cont-batching --cache-type-k turbo4 --cache-type-v turbo4 --fit on --jinja --reasoning-format auto --flash-attn on huggingface.co/Jackrong/Gemop…
English
11
10
159
7.8K