Chen Cheng

736 posts

Chen Cheng

@cherry_cc12

contributor of Qwen

San Jose, CA Katılım Mart 2023

169 Takip Edilen6.2K Takipçiler

Chen Cheng@cherry_cc12·5h

@JustinLin610 🥦-pro, need a wine pairing🍷

English

1.3K

Junyang Lin@JustinLin610·11h

this is a huge broccoli 🥦

English

412

23.5K

Chen Cheng@cherry_cc12·1d

@awnihannun Congrats!

English

402

Awni Hannun@awnihannun·1d

I joined Anthropic as a member of the technical staff. Excited to work on frontier modeling at a place with unwavering values and a generational mission.

English

204

2.2K

111.9K

Chen Cheng@cherry_cc12·3 Mar

@Eduardopto no

15.2K

Ed@Eduardopto·3 Mar

@cherry_cc12 Just because he thanked Elon? 😭 Wow

English

15.2K

Chen Cheng@cherry_cc12·3 Mar

Finally — yes, finally — our GPTQ-Int4 weights are here 🔥 The Qwen3.5 series maintains near-lossless accuracy under 4-bit weight and KV cache quantization. In terms of long-context efficiency: • Qwen3.5-27B supports 800K+ context length • Qwen3.5-35B-A3B exceeds 1M context on consumer-grade GPUs with 32GB VRAM • Qwen3.5-122B-A10B supports 1M+ context length on server-grade GPUs with 80GB VRAM

Qwen@Alibaba_Qwen

🔥 Qwen 3.5 Series GPTQ-Int4 weights are live. Native vLLM & SGLang support. ⚡️ Less VRAM. Faster inference. Run powerful models on limited-GPU setups. 👇 Grab the weights + example code: Hugging Face: huggingface.co/collections/Qw… ModelScope: modelscope.cn/collections/Qw…

English

404

40.3K

Chen Cheng@cherry_cc12·2 Mar

❤️

Elon Musk@elonmusk

@Alibaba_Qwen Impressive intelligence density

ART

6.2K

Chen Cheng@cherry_cc12·2 Mar

We just dropped the Qwen3.5 Small series — 0.8B / 2B / 4B / 9B 🚀 Small doesn’t mean limited anymore. Would love to hear what you build with them 👀

Qwen@Alibaba_Qwen

🚀 Introducing the Qwen 3.5 Small Model Series Qwen3.5-0.8B · Qwen3.5-2B · Qwen3.5-4B · Qwen3.5-9B ✨ More intelligence, less compute. These small models are built on the same Qwen3.5 foundation — native multimodal, improved architecture, scaled RL: • 0.8B / 2B → tiny, fast, great for edge device • 4B → a surprisingly strong multimodal base for lightweight agents • 9B → compact, but already closing the gap with much larger models And yes — we’re also releasing the Base models as well. We hope this better supports research, experimentation, and real-world industrial innovation. Hugging Face: huggingface.co/collections/Qw… ModelScope: modelscope.cn/collections/Qw…

English

247

10.6K

Chen Cheng@cherry_cc12·1 Mar

٩(◕‿◕｡)۶

Junyang Lin@JustinLin610

225

19.1K

Chen Cheng@cherry_cc12·1 Mar

Cool! GUI operations will better unlock the last mile of multimodal Agent capabilities — whether on mobile or PC.

BijanBowen@Ominousind

Android control with DGX Spark & Qwen3.5-27B with a simple web UI - Sped up about 4x.

English

5.6K

Chen Cheng@cherry_cc12·28 Şub

@KSimback @iamfakeguru Looking forward to seeing your test results!❤️

English

Kevin Simback 🍷@KSimback·28 Şub

@iamfakeguru I have a desktop with a 4090 and plenty of RAM, will be testing when I get home in a few days

English

2.9K

Kevin Simback 🍷@KSimback·28 Şub

The math is mathing even more now! Seeing many positive reports of running Qwen 35B-A3B locally on modest consumer hardware No need for a $10k+ Mac Studio So you get a Sonnet 4.5 grade model that can run privately at home, then you can chat with it on your phone via Tailscale

LM Studio@lmstudio

Qwen3.5-35B-A3B is now available in LM Studio! This model outperforms previous Qwen models that are more than 6x its size 🤯🚀 Requires about ~21GB to run locally. lmstudio.ai/models/qwen/qw…

English

1.2K

169.3K

Chen Cheng@cherry_cc12·28 Şub

With Qwen3-TTS Voice Design, you can shape tone and richness just using text. If you want more consistency: Generate the first segment with Voice Design, then continue with Voice Clone. Hope you enjoy it — would love to hear the fun voice styles you come up with. 🎙️

Daily Dose of Data Science@DailyDoseOfDS_

Big moment for text-to-speech. Qwen open-sourced a TTS model that lets you clone voices, design new ones & control speech using natural language. You can ask it "speak in a cheerful tone with slight nervousness," and it actually does that. No complex audio engineering needed!

English

4.5K

Chen Cheng@cherry_cc12·28 Şub

Glad to see Qwen 3.5 35B live on Venice 🚀

Venice@AskVenice

Qwen 3.5 35B is now available on Venice. Fast, efficient, and multilingual. 262K context window, multimodal input, and strong coding and agent performance — all fully private on Venice.

English

3.3K

Chen Cheng@cherry_cc12·28 Şub

More Qwen3.5 INT4 models — now optimized by Intel. 🚀

Haihao Shen@HaihaoShen

🎯More Qwen3.5 INT4 models are also available now! huggingface.co/Intel/Qwen3.5-… huggingface.co/Intel/Qwen3.5-… huggingface.co/Intel/Qwen3.5-… @Alibaba_Qwen @JustinLin610

English

5.7K

Chen Cheng@cherry_cc12·28 Şub

@bao_xiao78791 啊我看下

日本語

305

xiaobao@bao_xiao78791·28 Şub

Content security warning: input data may contain inappropriate content!怎么用google登录qwen登录不了呢😂，显示这个错误

中文

386

Chen Cheng@cherry_cc12·28 Şub

Multimodal. Right-sized. It's a solid start for the 397B model, but not quite there yet. Keep improving, Chong!

Arena.ai@arena

Top 10 Open Models: February 2026 in Text Arena. The top 3 labs have not changed since January, but the scores have gotten tighter between them: - @Zai_org's GLM-5, scoring 1455 - @Alibaba_Qwen's Qwen-3.5 397B A17B, scoring1454 - @Kimi_Moonshot's Kimi-K2.5 Thinking, 1452 The spread widens from there. The open leaderboard remains tightly clustered at the top, single-digit swings can reshuffle the overall rankings. See thread for more details on shifts this month.

English

57.3K

Chen Cheng retweetledi

Sudo su@sudoingX·27 Şub

look what a single consumer GPU just built. gave Qwen3.5-35B-A3B one prompt: build a cloud GPU marketplace with pricing cards, deploy templates, and a benchmark leaderboard. it planned the layout, wrote the animations, populated the data, and served it. one shot. one HTML file. then i told it to iterate. split the hero, add a floating GPU with neural network animation. glassmorphism on the cards. done. done. done. three rounds, no confusion, no regressions. 4-bit quantized. 19.7 GB. single RTX 3090. full coding agent claude code harness running on localhost. no API calls leaving my machine. no subscription. no rate limits. earlier today i pointed it at my own production website. it curled the HTML, found every broken link, and told me "pretty shell, empty core. would not recommend." then built a better version from scratch. local inference stops being a demo when you actually steer it. the models are there. they understand intent. but you have to meet them halfway with good prompts, clear context, and real project structure. that's the skill gap now. not the models. the steering. more experiments coming. i genuinely cannot stop playing with this thing.

Sudo su@sudoingX

this is the worst local AI will ever be. tomorrow it gets faster. next month the models get smarter. next year your GPU runs what a data center runs today. Qwen3.5-35B-A3B on a single 3090. told it to visualize its own expert routing. 256 experts, 8 active per token, rendered in 3D on the same GPU running inference. no API key. no subscription. no permission needed. closed AI isn't losing ground. it's losing the argument.

English

255

37.1K

Chen Cheng@cherry_cc12·27 Şub

OpenClaw + Qwen3.5 🦞

Alex Finn@AlexFinn

x.com/i/article/2027…

Filipino

101

20K

Chen Cheng@cherry_cc12·27 Şub

@RaghavKoch19380 @Alibaba_Qwen We'll release smaller dense models firstly. For 4-bit quantization, can you use community version for now(

English

🧟@RaghavKoch19380·27 Şub

@cherry_cc12 @Alibaba_Qwen any updates?

English

Qwen@Alibaba_Qwen·25 Şub

🔥 Qwen 3.5 Medium Model Series FP8 weights are now open and ready for deployment！ Native support for vLLM and SGLang. Check the model card for example code. ⚡️ Optimize your workflow with FP8 precision. 👇 Get the weights: Hugging Face：huggingface.co/collections/Qw… ModelScope：modelscope.cn/collections/Qw…

English

701

66.8K

Chen Cheng@cherry_cc12·26 Şub

Oh, I really like the picture!(✿◡‿◡)

Wildminder@wildmindai

RIP Monthly Fees. Really! Awesome combo Qwen3.5-35B + openclaw. Run it once, use it all day: $0 per prompt You don’t need huge/expensive models for your daily tasks: - Email +writing - Summaries - Non-stop coding - Agentic automation - Screenshots/images - Multilingual - No limits (except speed and electricity bill) Yeah, not super fast, but works well on 16GB VRAM + 64GB RAM.

English

1.6K

Chen Cheng@cherry_cc12·26 Şub

This shift is so real. Building a local camera analysis system with Qwen3-VL has been amazing! Now on weekends, I sit with my kid and turn his drawings and wild ideas into tiny playable games in minutes by Code Agent + Qwen. When code isn’t a barrier and time isn't the cost, life just gets a lot more colorful. 🧒🎨🎮

Andrej Karpathy@karpathy

It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the "progress as usual" way, but specifically this last December. There are a number of asterisks but imo coding agents basically didn’t work before December and basically work since - the models have significantly higher quality, long-term coherence and tenacity and they can power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow. Just to give an example, over the weekend I was building a local video analysis dashboard for the cameras of my home so I wrote: “Here is the local IP and username/password of my DGX Spark. Log in, set up ssh keys, set up vLLM, download and bench Qwen3-VL, set up a server endpoint to inference videos, a basic web ui dashboard, test everything, set it up with systemd, record memory notes for yourself and write up a markdown report for me”. The agent went off for ~30 minutes, ran into multiple issues, researched solutions online, resolved them one by one, wrote the code, tested it, debugged it, set up the services, and came back with the report and it was just done. I didn’t touch anything. All of this could easily have been a weekend project just 3 months ago but today it’s something you kick off and forget about for 30 minutes. As a result, programming is becoming unrecognizable. You’re not typing computer code into an editor like the way things were since computers were invented, that era is over. You're spinning up AI agents, giving them tasks *in English* and managing and reviewing their work in parallel. The biggest prize is in figuring out how you can keep ascending the layers of abstraction to set up long-running orchestrator Claws with all of the right tools, memory and instructions that productively manage multiple parallel Code instances for you. The leverage achievable via top tier "agentic engineering" feels very high right now. It’s not perfect, it needs high-level direction, judgement, taste, oversight, iteration and hints and ideas. It works a lot better in some scenarios than others (e.g. especially for tasks that are well-specified and where you can verify/test functionality). The key is to build intuition to decompose the task just right to hand off the parts that work and help out around the edges. But imo, this is nowhere near "business as usual" time in software.

English

Chen Cheng retweetledi

Ivan Fioravanti ᯅ@ivanfioravanti·25 Şub

MLX + OpenCode + Qwen3.5-122B-A10B-4bit on M3 Ultra created a great snake game! Work zero-shot. Video clearly in super fast mode during generation. I generated the prompt using Grok 4.20, it's in the article.

Ivan Fioravanti ᯅ@ivanfioravanti

x.com/i/article/2026…

English

397

74.7K

Keşfet

@JustinLin610 @awnihannun @Eduardopto @KSimback @iamfakeguru @bao_xiao78791 @elonmusk @BarackObama