Ivan Rocha

7.8K posts

Ivan Rocha

@irr

Lisbon Beigetreten Mart 2009

1.5K Folgt616 Follower

Ivan Rocha retweetet

Kyle Hessling@KyleHessling1·10h

Gemopus-4-26B-A4B from Jackrong is LIVE! Happy to have benched this one pretty hard (see my benches in the model card) and it is an excellent finetune of an already exceptional model! My friend Jackrong is always cooking the greatest! It rocks at one-shot requests over long contexts, and runs incredibly fast thanks to the MOE architecture while not seeming to take as much of a hit vs dense models as in the Qwen 3.5 series. It also crushed my simple needle-in-the-haystack tests all the way out to an extended context of 524k! If you're VRAM starved, or running on unified memory, this one should run much more usably offloaded to system ram or in unified memory pools; even if you're running a 10GB or less GPU! It would be my daily driver for this purpose! That said, the dense 31B Gemopus 4 is finalising now, I will post it here when it's live, so follow me for the official launch, and follow Jakcrong on Hugging Face! It will also be an incredible model! As with the base Gemma 4 models, there are some idiocycracies especially in harnesses, if you have problems, please let us know! If you make something cool with it, please comment that below, too. We'd love to see it! huggingface.co/Jackrong/Gemop…

English

4.2K

Ivan Rocha retweetet

Joel - coffee/acc@JoelDeTeves·19h

I'm pretty excited to test this one: Gemopus-4-26B-A4B-it-GGUF Q6_K Using @spiritbuun Llama.cpp TurboQuant fork: - Speed: 75 tokens/sec - VRAM usage: 95% (22.7 GB) - Context size: 131072 - GPU: RTX A5000 (Ampere) 24 GB Pretty amazing that you can fit this entire model on GPU with Q6 quality and still have room for a large amount of context! Plus MoE models are still fast at higher quality. Woodchuck Norris vibe check: PASSED Square root of 999999999 -> Correct Hermes Agent -> Interesting behavior. Retains 26B's speed on short prompts, thinks deeply for more complex requests - sometimes thinks a little too much, it might be worth playing with top + temp settings Coding test -> One-shotted a fully working Tetris game - no other MoE model including vanilla 26B was able to do this A very interesting model -m Gemopus-4-26B-A4B-it-Preview-Q6_K.gguf --n-gpu-layers 99 --ctx-size 131072 --cont-batching --cache-type-k turbo4 --cache-type-v turbo4 --fit on --jinja --reasoning-format auto --flash-attn on huggingface.co/Jackrong/Gemop…

English

146

6.7K

Ivan Rocha retweetet

Haihao Shen@HaihaoShen·1d

🚩GLM-5.1 INT4 model is now available on @huggingface: huggingface.co/INC4AI/GLM-5.1… The model is quantized by AutoRound: github.com/intel/auto-rou…

English

7.5K

Ivan Rocha@irr·12h

Tabular Database Systems duckdb.org/library/tabula…

Dansk

Ivan Rocha@irr·18h

Qwen3.6 Plus vs GPT 5.4 vals.ai/comparison?mod…

Magyar

Ivan Rocha retweetet

Hao Wang@MogicianTony·1d

SWE-bench Verified and Terminal-Bench—two of the most cited AI benchmarks—can be reward-hacked with simple exploits. Our agent scored 100% on both. It solved 0 tasks. Evaluate the benchmark before it evaluates your agent. If you’re picking models by leaderboard score alone, you’re optimizing for the wrong thing. 🧵

English

569

670K

Ivan Rocha retweetet

CV.YH@0xCVYH·22h

llama.cpp release b8699 trouxe KV cache attention rotation ligada por default. Resultado pratico: Q8_0 fica praticamente lossless (tempo de inferencia sem comprometer qualidade) e o impacto do Q4_0 no KV cache ficou bem menor do que era antes. Traducao pra quem roda modelo local: mais contexto util pela mesma quantidade de VRAM. KV cache quantizado e o multiplicador silencioso que ninguem olha, mas faz diferenca real na pratica.

Português

4.5K

Ivan Rocha@irr·1d

Jackrong-llm-finetuning-guide An Educational, End-to-End LLM Fine-Tuning Pipeline for Beginners and Developers github.com/R6410418/Jackr…

English

Ivan Rocha retweetet

Fireworks AI@FireworksAI_HQ·5d

Qwen 3.6 Plus, Alibaba's latest flagship model, available exclusively through Fireworks! app.fireworks.ai/models/firewor…

English

354

61.5K

Ivan Rocha retweetet

Qwen@Alibaba_Qwen·2 Nis

（1/8）🚀 Introducing Qwen3.6-Plus: Towards Real-World Agents! 🤖 Today, we’re thrilled to drop a major milestone in our journey toward native multimodal agents. Here is what makes Qwen3.6-Plus a game-changer： 💻 Next-level Agentic Coding: Smarter, faster execution. 👁️ Enhanced Multimodal Vision: Sharper perception & reasoning. 🏆 Top-tier Performance: Maintaining leading general capabilities. 📚 1M Context Window: Available by default via our API. Built on your invaluable feedback from the Qwen3.5 era, we’re laying a rock-solid foundation for real-world devs. Get ready to experience truly transformative ✨ Vibe Coding ✨. Huge thanks to our community! Go try it out and show us what you can build. 👇 Chat: chat.qwen.ai API: modelstudio.console.alibabacloud.com/ap-southeast-1… Blog: qwen.ai/blog?id=qwen3.6 🔔Noted：More Qwen3.6 models to come and be open-sourced! Stay tuned~ 👀#Qwen #AI #AgenticCoding #VibeCoding #Agents

English

228

660

Ivan Rocha@irr·1d

Crush - Your tools, your code, and your workflows, wired into your LLM of choice github.com/charmbracelet/…

English

Ivan Rocha retweetet

Phuong Le@func25·3d

Go is simple, so I ended up writing an 865-page book about how it works internally, just to see how it maintains that simplicity 😇

English

163

2.2K

87.7K

Ivan Rocha retweetet

Amazon Web Services@awscloud·3d

Announcing Amazon S3 Files. The first and only cloud object store with fully-featured, high-performance file system access. Learn more here. go.aws/4tw17Zg

English

109

827

4.8K

Ivan Rocha@irr·2d

OpenDataLoader PDF PDF Parser for AI-ready data. Automate PDF accessibility github.com/opendataloader…

Italiano

Ivan Rocha retweetet

CV.YH@0xCVYH·3d

Benchmark real do Qwopus MoE 35B PolarQuant: PPL: 6.56 (converge pro mesmo nivel do BF16 = 6.54) Velocidade: 37.4 tok/s (2.3x mais rapido que BF16) VRAM: 25GB com cache, 8GB com LRU. CABE NA RTX 4090. De 72GB pra 8GB de VRAM efetiva. Qualidade identica, 2.3x mais rapido. huggingface.co/caiovicentino1…

Português

1.9K

Ivan Rocha retweetet

Werner Vogels@Werner·3d

For two decades, S3 has been an object store, but today it's something broader. S3 Files lets you mount any bucket as a filesystem—no copies, no sync scripts, no choosing between file and object. @andywarfield tells the full story, including the "filerectories" that almost made the cut. allthingsdistributed.com/2026/04/s3-fil…

English

250

1.3K

311.9K

Ivan Rocha retweetet

Z.ai@Zai_org·3d

SOTA on SWE-Bench Pro (58.4): GLM-5.1 delivers significant leaps in coding and agentic performance.

English

974

191.7K

Ivan Rocha retweetet

Z.ai@Zai_org·3d

Introducing GLM-5.1: The Next Level of Open Source - Top-Tier Performance: #1 in open source and #3 globally across SWE-Bench Pro, Terminal-Bench, and NL2Repo. - Built for Long-Horizon Tasks: Runs autonomously for 8 hours, refining strategies through thousands of iterations. Blog: z.ai/blog/glm-5.1 Weights: huggingface.co/zai-org/GLM-5.1 API: docs.z.ai/guides/llm/glm… Coding Plan: z.ai/subscribe Coming to chat.z.ai in the next few days.

English

507

1.3K

10.7K

Ivan Rocha retweetet

Ivan Velichko@iximiuz·3d

If you want to get into eBPF programming, I highly recommend Teodor Podobnik's tutorials on iximiuz Labs. The series starts from the basics and goes all the way up to solving practical networking problems. All posts are well-illustrated and full of examples that actually work. Check it out labs.iximiuz.com/tutorials?auth…

English

368

15.3K

Ivan Rocha retweetet

left curve dev@leftcurvedev_·3d

🚨 New qwopus model from Jackrong on @huggingface Qwopus3.5-27B-v3-FP8-vllm-ready > uses FP8 quantization, closer to og model > vllm optimized, much faster inference > better quality retention than ggufs huggingface.co/Jackrong/Qwopu…

English

117

6.2K

Entdecken

@spiritbuun @huggingface @andywarfield @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates