warmshao
652 posts


Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…



me stepping down. bye my beloved qwen.

Z.ai Startup Program is NOW OPEN. What you can get: ·Free API credits ·Priority rate limits ·Exclusive Community ·Early API Access Who we're looking for: ·AI-native startups ·Agent builders ·SaaS founders integrating LLM infra ·Global teams building for real-world scale If you're building something that matters, don't wait!! Apply now: startup.z.ai Questions? Details? Follow & DM @ZaiforStartups

🚀 Introducing the Qwen 3.5 Small Model Series Qwen3.5-0.8B · Qwen3.5-2B · Qwen3.5-4B · Qwen3.5-9B ✨ More intelligence, less compute. These small models are built on the same Qwen3.5 foundation — native multimodal, improved architecture, scaled RL: • 0.8B / 2B → tiny, fast, great for edge device • 4B → a surprisingly strong multimodal base for lightweight agents • 9B → compact, but already closing the gap with much larger models And yes — we’re also releasing the Base models as well. We hope this better supports research, experimentation, and real-world industrial innovation. Hugging Face: huggingface.co/collections/Qw… ModelScope: modelscope.cn/collections/Qw…


🚀 Qwen3.5-397B-A17B is here: The first open-weight model in the Qwen3.5 series. 🖼️Native multimodal. Trained for real-world agents. ✨Powered by hybrid linear attention + sparse MoE and large-scale RL environment scaling. ⚡8.6x–19.0x decoding throughput vs Qwen3-Max 🌍201 languages & dialects 📜Apache2.0 licensed 🔗Dive in: GitHub: github.com/QwenLM/Qwen3.5 Chat: chat.qwen.ai API:modelstudio.console.alibabacloud.com/ap-southeast-1… Qwen Code: github.com/QwenLM/qwen-co… Hugging Face: huggingface.co/collections/Qw… ModelScope: modelscope.cn/collections/Qw… blog: qwen.ai/blog?id=qwen3.5

在vibesurf配置好qwen-asr后,配合上vibesurf skill,你就可以无缝在openclaw中使用语音识别了,这样可以在discord或飞书发送语音让claw🦞干活了






