unbug

2.4K posts

unbug

unbug

@unbug

https://t.co/5sKsiEXX8Y, CODELF (Github star 14k, https://t.co/z1Mfw3yNcy), #MIHTool (Mentioned in Google I/O'13, https://t.co/HS3Jxj8Zho)

Los Angeles, CA Katılım Ocak 2008
670 Takip Edilen1.2K Takipçiler
unbug
unbug@unbug·
@op7418 Google 的 AI models 幻觉严重
中文
0
0
0
8
歸藏(guizang.ai)
歸藏(guizang.ai)@op7418·
谷歌在产品上真是太慢了,终于推出了 Gemini 的 Mac 客户端。 全部用 Swift 编写的原生应用. 看了一下,功能相当简陋,很多能力都没有。 比如 Artifact 复杂点,网页都没办法渲染。整个 UI 非常糙,谷歌正常发挥水平。
歸藏(guizang.ai) tweet media
Josh Woodward@joshwoodward

Introducing Gemini on Mac. We heard your feedback. We recruited a small team. They built 100+ features in less than 100 days. 🤯 100% native Swift. Lightning fast. Let us know what you think!

中文
39
8
70
107.6K
unbug
unbug@unbug·
@claudeai Let me guess, only active 9b for pro users
English
0
0
0
5
Claude
Claude@claudeai·
Introducing Claude Opus 4.7, our most capable Opus model yet. It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back. You can hand off your hardest work with less supervision.
Claude tweet media
English
3.1K
6.6K
50.4K
4M
AVB
AVB@neural_avb·
NOOOOOOOOOO I literally just downloaded Qwen3.5-35B-A3B-MLX-4bit yesterday A 20GB model that took an hour to download Now there's already a daddy version???
Qwen@Alibaba_Qwen

⚡ Meet Qwen3.6-35B-A3B:Now Open-Source!🚀🚀 A sparse MoE model, 35B total params, 3B active. Apache 2.0 license. 🔥 Agentic coding on par with models 10x its active size 📷 Strong multimodal perception and reasoning ability 🧠 Multimodal thinking + non-thinking modes Efficient. Powerful. Versatile. Try it now👇 Blog:qwen.ai/blog?id=qwen3.… Qwen Studio:chat.qwen.ai HuggingFace:huggingface.co/Qwen/Qwen3.6-3… ModelScope:modelscope.cn/models/Qwen/Qw… API(‘Qwen3.6-Flash’ on Model Studio):Coming soon~ Stay tuned

English
19
1
175
18.5K
unbug
unbug@unbug·
@turingbook 听起来很官僚,典型的培训才能上岗的公司
中文
0
0
0
10
刘江/LIU Jiang
刘江/LIU Jiang@turingbook·
“苹果计划从其 Siri 团队(内部被称为落后团队)中选派约 200 名成员参加人工智能编程训练营。” 大家觉得应该表扬还是批评呢?
中文
3
0
0
2.2K
Kyle Hessling
Kyle Hessling@KyleHessling1·
Qwopus 3.5 v3.5 is live! This is simply a data-scaled continuation of the already excellent Qwopus v3! Resulting in better performance for multi-step agentic diagnosis and more in my tests! In layman's terms it's an expansion of v3, not a complete overhaul like the difference between v2 to v3 was! Keeps the good of v3 and adds some extra! Jackrong has been cooking non-stop! Much more to come! We're experimenting with GLM 5.1 traces for training datasets, and they seem to be really incredible as they're not adding fake outputs to throw off distillation like another big company we now know has been! Qwen 3.6 is seemingly rolling out today too! I'm pumped to see Qwopus 3.6 v1 up and rolling with the iterative finetuning improvements Jackrong has built on 3.5! huggingface.co/Jackrong/Qwopu…
English
2
5
25
1.1K
unbug
unbug@unbug·
@elonmusk Give it up will you, FSD never be must have!
English
0
0
1
10
unbug
unbug@unbug·
@Alibaba_Qwen I wish we have a 3.6-coder, will be the best option on the market
English
0
0
0
84
Qwen
Qwen@Alibaba_Qwen·
⚡ Meet Qwen3.6-35B-A3B:Now Open-Source!🚀🚀 A sparse MoE model, 35B total params, 3B active. Apache 2.0 license. 🔥 Agentic coding on par with models 10x its active size 📷 Strong multimodal perception and reasoning ability 🧠 Multimodal thinking + non-thinking modes Efficient. Powerful. Versatile. Try it now👇 Blog:qwen.ai/blog?id=qwen3.… Qwen Studio:chat.qwen.ai HuggingFace:huggingface.co/Qwen/Qwen3.6-3… ModelScope:modelscope.cn/models/Qwen/Qw… API(‘Qwen3.6-Flash’ on Model Studio):Coming soon~ Stay tuned
Qwen tweet media
English
269
892
6.1K
658.4K
unbug
unbug@unbug·
@grok is the next gen Google to me
English
1
0
0
12
David Hendrickson
David Hendrickson@TeksEdge·
🧪 New Benchmarks: Intel Arc Pro B70 32GB LLM benchmark for Qwen3.5-27B Q4 Single GPU • Single user: • vLLM: 13.43 tokens/s (tg512) • LM Studio (Vulkan): 11.87 tokens/s • Best tuned (SYCL llama.cpp): 22.47 tokens/s Strong prompt processing, but token generation is still slower than a used RTX 3060 or RTX 5070 Ti in real-world single-user chat. 32GB VRAM is nice… but the speed needs work.
David Hendrickson tweet media
English
13
3
40
6.7K
Tom Turney
Tom Turney@no_stp_on_snek·
I've been using nemotron here and there with hermes and not having this problem. It's probably not the model but hermes itself. little tests that may help: 1. try with a fresh conversation (shorter context) and see if the gibberish goes away. If it does, their KV cache is corrupting under memory pressure. 2. if OpenRouter is running this at aggressive quant (Q4 on a 120B model) the KV cache might be quantized too aggressively. try a different provider or model variant. 3. if you're self hosting (not sure based on the screenshot) check if KV cache quantization is enabled. the mixed-script output is a dead giveaway of corrupted attention values
English
5
0
2
386
vmiss
vmiss@vmiss33·
I'm getting a lot of responses like this using hermes - open router with free nemotron 3 super - just a case of you get what you pay for?
vmiss tweet media
English
17
0
22
3.3K
Tom Turney
Tom Turney@no_stp_on_snek·
ran a NIAH-style check on the mlx-vlm triattention PR using the same model (gemma-4-26b-a4b 4-bit). inserted “PURPLE ELEPHANT 7742” at start / middle / end of a ~6.6k token context and asked the model to retrieve it. baseline: middle PASS, end PASS TA-512: middle FAIL, end PARTIAL (drops “ELEPHANT”) TA-1024: middle FAIL → outputs “PURPLE RAIN 774” TA-2048: matches baseline the interesting part is the 1024 case. the model doesn’t just miss the needle, it hallucinates a semantically similar phrase. that’s consistent with the token being evicted but still partially activated. at 2048 it looks fine, but that’s also a low-pressure regime relative to context length. this is the gap i was pointing at earlier. MATH500 is mostly self-contained, so it doesn’t stress whether the eviction policy is keeping the right tokens under pressure. NIAH directly tests that. if important info gets dropped, you either see a miss or this kind of near-semantic hallucination. implementation looks solid. i think adding a targeted long-context retrieval test would give a more complete picture.
English
2
0
6
250
Prince Canuma
Prince Canuma@Prince_Canuma·
TriAttention MLX benchmark run on the full MATH500 is done after ~30h. We ran Gemma4-26B (5-bit) on M3 Ultra with KV cache budgets of 512, 1024, and 2048: → TA-2048: 76.6% vs 77.4% baseline — 4 problems lost out of 500 (-0.8%) → TA-1024: 75.6% — 9 problems lost (-1.8%) → TA-512: 72.0% — 27 problems lost (-5.4%) → Speed: ~76 tok/s across all modes — zero overhead For reference, the original paper reports TriAttention on Qwen3-8B: → 512: 55.5%, 1k: 68.5%, 2k: 69.0%, 3k: 69.8% (baseline ~70%) Our results on a different model family and scale track the same pattern. The 30-sample pilot estimated -3.4% for TA-2048. Full eval: -0.8%. Scaling up the eval mattered. Paper link: arxiv.org/pdf/2604.04921
Prince Canuma tweet media
Prince Canuma@Prince_Canuma

🧮 MATH 500 results for TriAttention on Gemma4-26B-A4B-it (5-bit quantized, M3 Ultra 512GB) using MLX-VLM TA-2048 preserves 96% of baseline accuracy (22/30 vs 23/30) with KV cache capped at 2048 tokens, regardless of reasoning length. Throughput stays rock-solid at ~77 tok/s across all modes. Our gap is larger than the paper's (-3.4% vs -1.2% at budget=2048) because: 1. We ran Gemma4 A4B in non-thinking mode 2. Only 5 full-attention layers (50 are sliding window), less surface area for TriAttention. 3. 5-bit quantization maybe adding noise on top of KV compression The takeaway: TriAttention works on Apple Silicon with MLX. Even on a non-reasoning mode with aggressive quantization, TA-2048 keeps accuracy intact. 🍎

English
4
2
52
4.5K
unbug
unbug@unbug·
@steipete @openclaw Peter, please, we out of money for claw, local models is our only hope
English
0
0
0
193
Peter Steinberger 🦞
If you look at GPT 5.4-Cyber and it's ability for closed source reverse engineering, I have bad news for you. I do very much feel the pain though, there's hundreds of teams that try to poke holes into @openclaw. Our response has been of rapid iteration and code hardening. Which did introduce occasiaonal regression (and yes you all been yelling at me), but I see as the only way forward. I would be very careful of other open source projects/harnesses that ignore this work and do not publish their advisories. github.com/openclaw/openc…
Bailey Pumfleet@pumfleet

Open source is dead. That’s not a statement we ever thought we’d make. @calcom was built on open source. It shaped our product, our community, and our growth. But the world has changed faster than our principles could keep up. AI has fundamentally altered the security landscape. What once required time, expertise, and intent can now be automated at scale. Code is no longer just read. It is scanned, mapped, and exploited. Near zero cost. In that world, transparency becomes exposure. Especially at scale. After a lot of deliberation, we’ve made the decision to close the core @calcom codebase. This is not a rejection of what open source gave us. It’s a response to what risks AI is making possible. We’re still supporting builders, releasing the core code under a new MIT-licensed open source project called cal. diy for hobbyists and tinkerers, but our priority now is simple: Protecting our customers and community at all costs. This may not be the most popular call. But we believe many companies will come to the same conclusion. My full explanation below ↓

English
81
92
1.6K
380.6K
unbug
unbug@unbug·
@Teknium Missing qwen3.5, that’s why “See results” wins
English
1
0
1
305
Teknium (e/λ)
Teknium (e/λ)@Teknium·
For local models, which is better in Hermes Agent?
English
105
4
130
40.5K
unbug
unbug@unbug·
Don’t waste your time, IQ3 models are completely broken for tool calling, means it’s never going to be an option for your OpenClaw/Herness/xxxCode
unbug tweet media
English
0
0
0
61
unbug
unbug@unbug·
@TheGeorgePu That’s why you tokens routers making money, they don’t pay for a dc
English
0
0
0
43
George Pu
George Pu@TheGeorgePu·
I've been GPU shipping for my company. One H100 on Google Cloud: $8,000 a month. Retail price: $30,000. Just renting for 4 months you could own them for life. With cloud GPUs, you don't own ANYTHING. You're just paying someone else's GPU mortgage. Can't host it at your house because of noise/cooling? Try a colo place - I have one right next to my office. Starting at $1k/mo. Makes sense fast. Here's where I think this is going: Personal use - a Mac Mini or two running local models. Forever. Business - stacking Mac Studios first. Then own GPUs in a colo rack. Everyone's arguing about which model is best. Nobody's asking who owns the computer it runs on. Testing both paths now. Will document everything.
English
45
6
263
30.2K
unbug
unbug@unbug·
@basecampbernie That’s shame, my 9yrs old v100 almost runs better , and it’s only $200
English
1
0
1
183
Base Camp Bernie
Base Camp Bernie@basecampbernie·
Small model MoE shootout on DGX Spark GB10. 262K context, 2048 tok generation: Qwen3.5-35B-A3B MoE (Q4_K_XL q8/q8): 60 t/s Gemma 4 26B-A4B MoE (Q4_K_XL q4/q4): 51 t/s Qwen3.5-35B-A3B MoE (Q8_K_XL q8/q4): 35 t/s Gemma 4 26B-A4B MoE (Q8_K_XL q4/q4): 39 t/s MiniMax-M2.7 228B MoE (IQ4 101GB): 24 t/s 3-4B active params pushing 60 t/s at full 262K context. The MoE efficiency is wild. Qwen 3B active outpaces Gemma 4B active on raw speed, but Gemma's reasoning edge closes the gap on quality. All llama.cpp on a single GB10. These small MoEs are the real hero models of 2026.
Base Camp Bernie tweet mediaBase Camp Bernie tweet mediaBase Camp Bernie tweet mediaBase Camp Bernie tweet media
Indonesia
13
13
119
10.8K