unbug

2.4K posts

unbug

@unbug

https://t.co/5sKsiEXX8Y, CODELF (Github star 14k, https://t.co/z1Mfw3yNcy), #MIHTool (Mentioned in Google I/O'13, https://t.co/HS3Jxj8Zho)

Los Angeles, CA Katılım Ocak 2008

670 Takip Edilen1.2K Takipçiler

unbug@unbug·55m

@op7418 Google 的 AI models 幻觉严重

中文

歸藏(guizang.ai)@op7418·14h

谷歌在产品上真是太慢了，终于推出了 Gemini 的 Mac 客户端。全部用 Swift 编写的原生应用. 看了一下，功能相当简陋，很多能力都没有。比如 Artifact 复杂点，网页都没办法渲染。整个 UI 非常糙，谷歌正常发挥水平。

Josh Woodward@joshwoodward

Introducing Gemini on Mac. We heard your feedback. We recruited a small team. They built 100+ features in less than 100 days. 🤯 100% native Swift. Lightning fast. Let us know what you think!

中文

107.6K

unbug@unbug·57m

@claudeai Let me guess, only active 9b for pro users

English

Claude@claudeai·2h

Introducing Claude Opus 4.7, our most capable Opus model yet. It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back. You can hand off your hardest work with less supervision.

English

3.1K

6.6K

50.4K

unbug@unbug·59m

@neural_avb @huggingface yeah, feel like 20yrs p2p tech

English

AVB@neural_avb·1h

@unbug @huggingface I know right

English

AVB@neural_avb·3h

NOOOOOOOOOO I literally just downloaded Qwen3.5-35B-A3B-MLX-4bit yesterday A 20GB model that took an hour to download Now there's already a daddy version???

Qwen@Alibaba_Qwen

⚡ Meet Qwen3.6-35B-A3B：Now Open-Source！🚀🚀 A sparse MoE model, 35B total params, 3B active. Apache 2.0 license. 🔥 Agentic coding on par with models 10x its active size 📷 Strong multimodal perception and reasoning ability 🧠 Multimodal thinking + non-thinking modes Efficient. Powerful. Versatile. Try it now👇 Blog：qwen.ai/blog?id=qwen3.… Qwen Studio：chat.qwen.ai HuggingFace：huggingface.co/Qwen/Qwen3.6-3… ModelScope：modelscope.cn/models/Qwen/Qw… API（‘Qwen3.6-Flash’ on Model Studio）：Coming soon～ Stay tuned

English

175

18.5K

unbug@unbug·1h

@turingbook 听起来很官僚，典型的培训才能上岗的公司

中文

刘江/LIU Jiang@turingbook·8h

“苹果计划从其 Siri 团队（内部被称为落后团队）中选派约 200 名成员参加人工智能编程训练营。” 大家觉得应该表扬还是批评呢？

中文

2.2K

unbug@unbug·1h

@KyleHessling1 Move on to 3.6

English

Kyle Hessling@KyleHessling1·2h

Qwopus 3.5 v3.5 is live! This is simply a data-scaled continuation of the already excellent Qwopus v3! Resulting in better performance for multi-step agentic diagnosis and more in my tests! In layman's terms it's an expansion of v3, not a complete overhaul like the difference between v2 to v3 was! Keeps the good of v3 and adds some extra! Jackrong has been cooking non-stop! Much more to come! We're experimenting with GLM 5.1 traces for training datasets, and they seem to be really incredible as they're not adding fake outputs to throw off distillation like another big company we now know has been! Qwen 3.6 is seemingly rolling out today too! I'm pumped to see Qwopus 3.6 v1 up and rolling with the iterative finetuning improvements Jackrong has built on 3.5! huggingface.co/Jackrong/Qwopu…

English

1.1K

unbug@unbug·1h

@elonmusk Give it up will you, FSD never be must have!

English

Elon Musk@elonmusk·9h

ZXX

4.3K

7.4K

89.2K

30.5M

unbug@unbug·1h

@claudeai @AnthropicAI Your service sucks, even openrouter better then you

English

unbug@unbug·1h

@Alibaba_Qwen I wish we have a 3.6-coder, will be the best option on the market

English

Qwen@Alibaba_Qwen·3h

English

269

892

6.1K

658.4K

unbug@unbug·1h

@grok is the next gen Google to me

English

unbug@unbug·1h

@Alibaba_Qwen Best model ever! My love

English

unbug@unbug·5h

@TeksEdge V100 with PCIE is better, $200

English

David Hendrickson@TeksEdge·1d

🧪 New Benchmarks: Intel Arc Pro B70 32GB LLM benchmark for Qwen3.5-27B Q4 Single GPU • Single user: • vLLM: 13.43 tokens/s (tg512) • LM Studio (Vulkan): 11.87 tokens/s • Best tuned (SYCL llama.cpp): 22.47 tokens/s Strong prompt processing, but token generation is still slower than a used RTX 3060 or RTX 5070 Ti in real-world single-user chat. 32GB VRAM is nice… but the speed needs work.

English

6.7K

unbug@unbug·5h

@davinder0110v @ollama Also every time restart the server 99% downloads progress reset to zero

English

David V — e/acc@davinder0110v·18h

Remove model is not working on @ollama web :(

English

2.3K

unbug@unbug·14h

@no_stp_on_snek @vmiss33 Yeah, local models is better

English

Tom Turney@no_stp_on_snek·21h

I've been using nemotron here and there with hermes and not having this problem. It's probably not the model but hermes itself. little tests that may help: 1. try with a fresh conversation (shorter context) and see if the gibberish goes away. If it does, their KV cache is corrupting under memory pressure. 2. if OpenRouter is running this at aggressive quant (Q4 on a 120B model) the KV cache might be quantized too aggressively. try a different provider or model variant. 3. if you're self hosting (not sure based on the screenshot) check if KV cache quantization is enabled. the mixed-script output is a dead giveaway of corrupted attention values

English

386

vmiss@vmiss33·22h

I'm getting a lot of responses like this using hermes - open router with free nemotron 3 super - just a case of you get what you pay for?

English

3.3K

unbug@unbug·14h

@no_stp_on_snek @Prince_Canuma How is it compared to TurboQuant?

English

Tom Turney@no_stp_on_snek·17h

ran a NIAH-style check on the mlx-vlm triattention PR using the same model (gemma-4-26b-a4b 4-bit). inserted “PURPLE ELEPHANT 7742” at start / middle / end of a ~6.6k token context and asked the model to retrieve it. baseline: middle PASS, end PASS TA-512: middle FAIL, end PARTIAL (drops “ELEPHANT”) TA-1024: middle FAIL → outputs “PURPLE RAIN 774” TA-2048: matches baseline the interesting part is the 1024 case. the model doesn’t just miss the needle, it hallucinates a semantically similar phrase. that’s consistent with the token being evicted but still partially activated. at 2048 it looks fine, but that’s also a low-pressure regime relative to context length. this is the gap i was pointing at earlier. MATH500 is mostly self-contained, so it doesn’t stress whether the eviction policy is keeping the right tokens under pressure. NIAH directly tests that. if important info gets dropped, you either see a miss or this kind of near-semantic hallucination. implementation looks solid. i think adding a targeted long-context retrieval test would give a more complete picture.

English

250

Prince Canuma@Prince_Canuma·18h

TriAttention MLX benchmark run on the full MATH500 is done after ~30h. We ran Gemma4-26B (5-bit) on M3 Ultra with KV cache budgets of 512, 1024, and 2048: → TA-2048: 76.6% vs 77.4% baseline — 4 problems lost out of 500 (-0.8%) → TA-1024: 75.6% — 9 problems lost (-1.8%) → TA-512: 72.0% — 27 problems lost (-5.4%) → Speed: ~76 tok/s across all modes — zero overhead For reference, the original paper reports TriAttention on Qwen3-8B: → 512: 55.5%, 1k: 68.5%, 2k: 69.0%, 3k: 69.8% (baseline ~70%) Our results on a different model family and scale track the same pattern. The 30-sample pilot estimated -3.4% for TA-2048. Full eval: -0.8%. Scaling up the eval mattered. Paper link: arxiv.org/pdf/2604.04921

Prince Canuma@Prince_Canuma

🧮 MATH 500 results for TriAttention on Gemma4-26B-A4B-it (5-bit quantized, M3 Ultra 512GB) using MLX-VLM TA-2048 preserves 96% of baseline accuracy (22/30 vs 23/30) with KV cache capped at 2048 tokens, regardless of reasoning length. Throughput stays rock-solid at ~77 tok/s across all modes. Our gap is larger than the paper's (-3.4% vs -1.2% at budget=2048) because: 1. We ran Gemma4 A4B in non-thinking mode 2. Only 5 full-attention layers (50 are sliding window), less surface area for TriAttention. 3. 5-bit quantization maybe adding noise on top of KV compression The takeaway: TriAttention works on Apple Silicon with MLX. Even on a non-reasoning mode with aggressive quantization, TA-2048 keeps accuracy intact. 🍎

English

4.5K

unbug@unbug·1d

@steipete @openclaw Peter, please, we out of money for claw, local models is our only hope

English

193

Peter Steinberger 🦞@steipete·1d

If you look at GPT 5.4-Cyber and it's ability for closed source reverse engineering, I have bad news for you. I do very much feel the pain though, there's hundreds of teams that try to poke holes into @openclaw. Our response has been of rapid iteration and code hardening. Which did introduce occasiaonal regression (and yes you all been yelling at me), but I see as the only way forward. I would be very careful of other open source projects/harnesses that ignore this work and do not publish their advisories. github.com/openclaw/openc…

Bailey Pumfleet@pumfleet

Open source is dead. That’s not a statement we ever thought we’d make. @calcom was built on open source. It shaped our product, our community, and our growth. But the world has changed faster than our principles could keep up. AI has fundamentally altered the security landscape. What once required time, expertise, and intent can now be automated at scale. Code is no longer just read. It is scanned, mapped, and exploited. Near zero cost. In that world, transparency becomes exposure. Especially at scale. After a lot of deliberation, we’ve made the decision to close the core @calcom codebase. This is not a rejection of what open source gave us. It’s a response to what risks AI is making possible. We’re still supporting builders, releasing the core code under a new MIT-licensed open source project called cal. diy for hobbyists and tinkerers, but our priority now is simple: Protecting our customers and community at all costs. This may not be the most popular call. But we believe many companies will come to the same conclusion. My full explanation below ↓

English

1.6K

380.6K

unbug@unbug·1d

@Teknium Missing qwen3.5, that’s why “See results” wins

English

305

Teknium (e/λ)@Teknium·1d

For local models, which is better in Hermes Agent?

English

105

130

40.5K

unbug@unbug·1d

Don’t waste your time, IQ3 models are completely broken for tool calling, means it’s never going to be an option for your OpenClaw/Herness/xxxCode

English

unbug@unbug·1d

@TheGeorgePu That’s why you tokens routers making money, they don’t pay for a dc

English

George Pu@TheGeorgePu·1d

I've been GPU shipping for my company. One H100 on Google Cloud: $8,000 a month. Retail price: $30,000. Just renting for 4 months you could own them for life. With cloud GPUs, you don't own ANYTHING. You're just paying someone else's GPU mortgage. Can't host it at your house because of noise/cooling? Try a colo place - I have one right next to my office. Starting at $1k/mo. Makes sense fast. Here's where I think this is going: Personal use - a Mac Mini or two running local models. Forever. Business - stacking Mac Studios first. Then own GPUs in a colo rack. Everyone's arguing about which model is best. Nobody's asking who owns the computer it runs on. Testing both paths now. Will document everything.

English

263

30.2K

unbug@unbug·1d

@basecampbernie That’s shame, my 9yrs old v100 almost runs better , and it’s only $200

English

183

Base Camp Bernie@basecampbernie·1d

Small model MoE shootout on DGX Spark GB10. 262K context, 2048 tok generation: Qwen3.5-35B-A3B MoE (Q4_K_XL q8/q8): 60 t/s Gemma 4 26B-A4B MoE (Q4_K_XL q4/q4): 51 t/s Qwen3.5-35B-A3B MoE (Q8_K_XL q8/q4): 35 t/s Gemma 4 26B-A4B MoE (Q8_K_XL q4/q4): 39 t/s MiniMax-M2.7 228B MoE (IQ4 101GB): 24 t/s 3-4B active params pushing 60 t/s at full 262K context. The MoE efficiency is wild. Qwen 3B active outpaces Gemma 4B active on raw speed, but Gemma's reasoning edge closes the gap on quality. All llama.cpp on a single GB10. These small MoEs are the real hero models of 2026.

Indonesia

119

10.8K

Keşfet

@op7418 @claudeai @neural_avb @huggingface @turingbook @KyleHessling1 @elonmusk @AnthropicAI