Junyang Lin

3.1K posts

Junyang Lin banner
Junyang Lin

Junyang Lin

@JustinLin610

❤️ 🍵 ☕️ 🍷 🥃

Katılım Aralık 2015
2K Takip Edilen80.1K Takipçiler
Junyang Lin
Junyang Lin@JustinLin610·
this is a huge broccoli 🥦
Junyang Lin tweet media
English
29
5
386
20.4K
Fuli Luo
Fuli Luo@_LuoFuli·
MiMo-V2-Pro & Omni & TTS is out. Our first full-stack model family built truly for the Agent era. I call this a quiet ambush — not because we planned it, but because the shift from Chat to Agent paradigm happened so fast, even we barely believed it. Somewhere in between was a process that was thrilling, painful, and fascinating all at once. The 1T base model started training months ago. The original goal was long-context reasoning efficiency. Hybrid Attention carries real innovation, without overreaching — and it turns out to be exactly the right foundation for the Agent era. 1M context window. MTP inference for ultra-low latency and cost. These architectural decisions weren't trendy. They were a structural advantage we built before we needed it. What changed everything was experiencing a complex agentic scaffold — what I'd call orchestrated Context — for the first time. I was shocked on day one. I tried to convince the team to use it. That didn't work. So I gave a hard mandate: anyone on MiMo Team with fewer than 100 conversations tomorrow can quit. It worked. Once the team's imagination was ignited by what agentic systems could do, that imagination converted directly into research velocity. People ask why we move so fast. I saw it firsthand building DeepSeek R1. My honest summary: — Backbone and Infra research has long cycles. You need strategic conviction a year before it pays off. — Posttrain agility is a different muscle: product intuition driving evaluation, iteration cycles compressed, paradigm shifts caught early. — And the constant: curiosity, sharp technical instinct, decisive execution, full commitment — and something that's easy to underestimate: a genuine love for the world you're building for. We will open-source — when the models are stable enough to deserve it. From Beijing, very late, not quite awake.
English
237
349
3.9K
1.1M
Junyang Lin retweetledi
Jerry Tworek
Jerry Tworek@MillionInt·
Being brave can be a moat. Especially in a world where top management of most companies isn’t.
English
17
13
443
37.4K
Guodong Zhang
Guodong Zhang@Guodzh·
Last day at xAI. Wild journey past three years but excited about next chapter. Thanks all for the love and support yesterday. So many friends made along the way and I will miss you all!
English
236
62
2.5K
650.4K
Junyang Lin retweetledi
Lei Li
Lei Li@_TobiasLee·
Agents are doing real work, but existing benchmarks still test them in isolation. Today we’re releasing Claw-Eval 🦞: an open-source, transparent evaluation framework for AI agents. We feature 104 tasks spanning daily assistants, Office QA, deep finance research, and terminal usage. We test completion, robustness, and safety across real and mock services with configurable error injection. Fully traceable and human-verified. First leaderboard results: Claude Opus 4.6 @AnthropicAI tops pass rate (68.3%), but Gemini 3.1 @GeminiApp Pro edges it on avg score (0.764 vs 0.759). Agents have a long way to go.🤨 Check it out: claw-eval.github.io @steipete @openclaw
Lei Li tweet media
English
10
26
135
33.2K
Junyang Lin
Junyang Lin@JustinLin610·
sry for missing messages. will respond asap
English
99
11
819
89.3K
Junyang Lin
Junyang Lin@JustinLin610·
me stepping down. bye my beloved qwen.
English
1.7K
741
13.6K
6.5M
Junyang Lin retweetledi
ollama
ollama@ollama·
The Qwen 3.5 small models are available on Ollama. All models support native tool calling, thinking, and multimodal capabilities in Ollama. 9B: ollama run qwen3.5:9b 4B: ollama run qwen3.5:4b 2B: ollama run qwen3.5:2b 0.8B ollama run qwen3.5:0.8b Model page, including the bigger Qwen 3.5 models: ollama.com/library/qwen3.5
Suomi
32
85
1.2K
248.1K
Junyang Lin retweetledi
独立开发者William
独立开发者William@DLKFZWilliam2·
有人已经直接用 Qwen3.5-27B + DGX Spark 做成 Android Agent,通过 Web UI 下发任务,让模型自己读屏、决策、点按,推理速度提升 4 倍。🤯
中文
60
286
1.7K
464.7K
Awni Hannun
Awni Hannun@awnihannun·
Today is my last day at Apple. Building MLX with our amazing team and community has been an absolute pleasure. It's still early days for AI on Apple silicon. Apple makes the best consumer hardware on the planet. There's so much potential for it to be the leading platform for AI. And I'm confident MLX will continue to have a big role in that. To the future: MLX remains in the exceptionally capable hands of our team including @angeloskath, @zcbenz, @DiganiJagrit, @NasFilippova, @trebolloc (and others not on X). Follow them or @shshnkp for future updates.
Awni Hannun tweet media
English
260
94
2.2K
395.9K
Junyang Lin retweetledi
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
Alibaba has expanded its Qwen3.5 model family with 3 new models - the 27B model is a standout, scoring 42 on the Artificial Analysis Intelligence Index and matching open weights models 8-25x its size @Alibaba_Qwen has expanded the Qwen3.5 family with three new models alongside the 397B flagship released earlier this month: the Qwen3.5 27B (Dense, scoring 42 on Intelligence Index), Qwen3.5 122B A10B (MoE, 42), and Qwen3.5 35B A3B (MoE, 37). The two MoE (Mixture-of-Experts) models only activate a fraction of the total parameters per forward pass (10B of 122B and ~3B of 35B respectively). The Intelligence Index is our synthesis metric incorporating 10 evaluations covering general reasoning, agentic tasks, coding, and scientific reasoning. All models are Apache 2.0 licensed, natively support 262K context, and return to the unified thinking/non-thinking hybrid architecture from the original Qwen3, after Alibaba moved to separate Instruct and Reasoning checkpoints with the Qwen3 2507 updates. Key benchmarking results for the reasoning variants: ➤ Qwen3.5 27B scores 42 on Intelligence Index and is the most intelligent model under 230B. The nearest model of similar size is GLM-4.7-Flash (31B total, 3B active) which scores 30. Open weights models of equivalent intelligence are 8-25x larger in terms of total parameters: MiniMax-M2.5 (230B, 42), DeepSeek V3.2 (685B, 42), and GLM-4.7 (357B, 42). In FP8 precision it takes ~27GB to store the model weights, while in 4-bit quantization you can use laptop quality hardware with 16GB+ of RAM ➤ Qwen3.5 27B scores 1205 on GDPval-AA (Agentic Real-World Work Tasks), placing it alongside larger models. For context, MiniMax-M2.5 scores 1206, GLM-4.7 (Reasoning) scores 1200, and DeepSeek V3.2 (Reasoning) scores 1194. This is particularly notable for a 27B parameter model and suggests strong agentic capability for its size. GDPval-AA tests models on real-world tasks across 44 occupations and 9 major industries ➤ AA-Omniscience remains a relative weakness across the Qwen3.5 family, driven primarily by lower accuracy rather than hallucination rate. Qwen3.5 27B scores -42 on AA-Omniscience, comparable to MiniMax-M2.5 (-40) but behind DeepSeek V3.2 (-21) and GLM-4.7 (-35). Although Qwen3.5 27B's hallucination rate (80%) is lower than peers (GLM-4.7 90%, MiniMax 89%, DeepSeek 82%), its accuracy is also lower at 21% vs 34% for DeepSeek V3.2 and 29% for GLM-4.7. This is likely a consequence of model size - we have generally observed that models with more total parameters perform better on accuracy in AA-Omniscience, as broader knowledge recall benefits from larger parameter counts ➤ Qwen3.5 27B is equivalently intelligent to Qwen3.5 122B A10B. The 122B A10B is a Mixture-of-Experts model that only activates 10B of its 122B total parameters per forward pass. The 27B model leads in GDPval-AA (1205 Elo vs 1145 Elo) and slightly on TerminalBench (+1.5 p.p.), while the 122B model leads on SciCode (+2.5 p.p.), HLE (+1.2 p.p.), and has a lower hallucination rate (Omniscience -40 vs -42) ➤ Qwen3.5 35B A3B (Reasoning, 37) is the most intelligent model with ~3B active parameters, 7 points ahead of GLM-4.7-Flash (30). Other models in this ~3B active category include Qwen3 Coder Next (80B total, 28), Qwen3 Next 80B A3B (27), and NVIDIA Nemotron 3 Nano 30B A3B (24) ➤ Qwen3.5 27B used 98M output tokens to run the Intelligence Index, costing ~$299 via Alibaba Cloud API. This is notably high token usage compared to models at similar intelligence: MiniMax-M2.5 (56M), DeepSeek V3.2 (61M), and even the larger Qwen3.5 397B (86M). Other information: ➤ Context window: 262K tokens (extendable to 1M via YaRN) ➤ License: Apache 2.0 ➤ API pricing (Alibaba Cloud): 397B: $0.60/$3.60, 122B: $0.40/$3.20, 27B: $0.30/$2.40, 35B A3B: $0.25/$2.00 per 1M input/output tokens
Artificial Analysis tweet media
English
23
77
582
92.3K
Junyang Lin
Junyang Lin@JustinLin610·
thx to nv for the support
NVIDIA AI Developer@NVIDIAAIDev

✨ Qwen3.5 — new from @Alibaba_Qwen — introduces a frontier‑class VLM built for native multimodal agents. With a ~400B‑parameter architecture combining MoE and Gated Delta Networks, Qwen3.5 can reason across text, code, and vision — and even understand and navigate user interfaces. Learn how to: ✅ Run Qwen3.5 on free NVIDIA GPU endpoints ✅ Deploy with NIM ✅ Fine‑tune using NVIDIA NeMo See the details in our technical blog ➡️ developer.nvidia.com/blog/develop-n…

English
3
5
150
25.6K