Chen Cheng

736 posts

Chen Cheng banner
Chen Cheng

Chen Cheng

@cherry_cc12

contributor of Qwen

San Jose, CA Katılım Mart 2023
169 Takip Edilen6.2K Takipçiler
Junyang Lin
Junyang Lin@JustinLin610·
this is a huge broccoli 🥦
Junyang Lin tweet media
English
31
5
412
23.5K
Awni Hannun
Awni Hannun@awnihannun·
I joined Anthropic as a member of the technical staff. Excited to work on frontier modeling at a place with unwavering values and a generational mission.
English
204
37
2.2K
111.9K
Ed
Ed@Eduardopto·
@cherry_cc12 Just because he thanked Elon? 😭 Wow
English
2
0
30
15.2K
Chen Cheng
Chen Cheng@cherry_cc12·
Finally — yes, finally — our GPTQ-Int4 weights are here 🔥 The Qwen3.5 series maintains near-lossless accuracy under 4-bit weight and KV cache quantization. In terms of long-context efficiency: • Qwen3.5-27B supports 800K+ context length • Qwen3.5-35B-A3B exceeds 1M context on consumer-grade GPUs with 32GB VRAM • Qwen3.5-122B-A10B supports 1M+ context length on server-grade GPUs with 80GB VRAM
Qwen@Alibaba_Qwen

🔥 Qwen 3.5 Series GPTQ-Int4 weights are live. Native vLLM & SGLang support. ⚡️ Less VRAM. Faster inference. Run powerful models on limited-GPU setups. 👇 Grab the weights + example code: Hugging Face: huggingface.co/collections/Qw… ModelScope: modelscope.cn/collections/Qw…

English
14
22
404
40.3K
Kevin Simback 🍷
Kevin Simback 🍷@KSimback·
@iamfakeguru I have a desktop with a 4090 and plenty of RAM, will be testing when I get home in a few days
English
2
0
3
2.9K
Kevin Simback 🍷
Kevin Simback 🍷@KSimback·
The math is mathing even more now! Seeing many positive reports of running Qwen 35B-A3B locally on modest consumer hardware No need for a $10k+ Mac Studio So you get a Sonnet 4.5 grade model that can run privately at home, then you can chat with it on your phone via Tailscale
LM Studio@lmstudio

Qwen3.5-35B-A3B is now available in LM Studio! This model outperforms previous Qwen models that are more than 6x its size 🤯🚀 Requires about ~21GB to run locally. lmstudio.ai/models/qwen/qw…

English
64
63
1.2K
169.3K
Chen Cheng
Chen Cheng@cherry_cc12·
With Qwen3-TTS Voice Design, you can shape tone and richness just using text. If you want more consistency: Generate the first segment with Voice Design, then continue with Voice Clone. Hope you enjoy it — would love to hear the fun voice styles you come up with. 🎙️
Daily Dose of Data Science@DailyDoseOfDS_

Big moment for text-to-speech. Qwen open-sourced a TTS model that lets you clone voices, design new ones & control speech using natural language. You can ask it "speak in a cheerful tone with slight nervousness," and it actually does that. No complex audio engineering needed!

English
2
1
46
4.5K
xiaobao
xiaobao@bao_xiao78791·
Content security warning: input data may contain inappropriate content!怎么用google登录qwen登录不了呢😂,显示这个错误
xiaobao tweet media
中文
1
0
0
386
Chen Cheng
Chen Cheng@cherry_cc12·
Multimodal. Right-sized. It's a solid start for the 397B model, but not quite there yet. Keep improving, Chong!
Arena.ai@arena

Top 10 Open Models: February 2026 in Text Arena. The top 3 labs have not changed since January, but the scores have gotten tighter between them: - @Zai_org's GLM-5, scoring 1455 - @Alibaba_Qwen's Qwen-3.5 397B A17B, scoring1454 - @Kimi_Moonshot's Kimi-K2.5 Thinking, 1452 The spread widens from there. The open leaderboard remains tightly clustered at the top, single-digit swings can reshuffle the overall rankings. See thread for more details on shifts this month.

English
1
3
78
57.3K
Chen Cheng retweetledi
Sudo su
Sudo su@sudoingX·
look what a single consumer GPU just built. gave Qwen3.5-35B-A3B one prompt: build a cloud GPU marketplace with pricing cards, deploy templates, and a benchmark leaderboard. it planned the layout, wrote the animations, populated the data, and served it. one shot. one HTML file. then i told it to iterate. split the hero, add a floating GPU with neural network animation. glassmorphism on the cards. done. done. done. three rounds, no confusion, no regressions. 4-bit quantized. 19.7 GB. single RTX 3090. full coding agent claude code harness running on localhost. no API calls leaving my machine. no subscription. no rate limits. earlier today i pointed it at my own production website. it curled the HTML, found every broken link, and told me "pretty shell, empty core. would not recommend." then built a better version from scratch. local inference stops being a demo when you actually steer it. the models are there. they understand intent. but you have to meet them halfway with good prompts, clear context, and real project structure. that's the skill gap now. not the models. the steering. more experiments coming. i genuinely cannot stop playing with this thing.
Sudo su@sudoingX

this is the worst local AI will ever be. tomorrow it gets faster. next month the models get smarter. next year your GPU runs what a data center runs today. Qwen3.5-35B-A3B on a single 3090. told it to visualize its own expert routing. 256 experts, 8 active per token, rendered in 3D on the same GPU running inference. no API key. no subscription. no permission needed. closed AI isn't losing ground. it's losing the argument.

English
12
22
255
37.1K
Qwen
Qwen@Alibaba_Qwen·
🔥 Qwen 3.5 Medium Model Series FP8 weights are now open and ready for deployment! Native support for vLLM and SGLang. Check the model card for example code. ⚡️ Optimize your workflow with FP8 precision. 👇 Get the weights: Hugging Face:huggingface.co/collections/Qw… ModelScope:modelscope.cn/collections/Qw…
English
22
57
701
66.8K
Chen Cheng
Chen Cheng@cherry_cc12·
This shift is so real. Building a local camera analysis system with Qwen3-VL has been amazing! Now on weekends, I sit with my kid and turn his drawings and wild ideas into tiny playable games in minutes by Code Agent + Qwen. When code isn’t a barrier and time isn't the cost, life just gets a lot more colorful. 🧒🎨🎮
Andrej Karpathy@karpathy

It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the "progress as usual" way, but specifically this last December. There are a number of asterisks but imo coding agents basically didn’t work before December and basically work since - the models have significantly higher quality, long-term coherence and tenacity and they can power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow. Just to give an example, over the weekend I was building a local video analysis dashboard for the cameras of my home so I wrote: “Here is the local IP and username/password of my DGX Spark. Log in, set up ssh keys, set up vLLM, download and bench Qwen3-VL, set up a server endpoint to inference videos, a basic web ui dashboard, test everything, set it up with systemd, record memory notes for yourself and write up a markdown report for me”. The agent went off for ~30 minutes, ran into multiple issues, researched solutions online, resolved them one by one, wrote the code, tested it, debugged it, set up the services, and came back with the report and it was just done. I didn’t touch anything. All of this could easily have been a weekend project just 3 months ago but today it’s something you kick off and forget about for 30 minutes. As a result, programming is becoming unrecognizable. You’re not typing computer code into an editor like the way things were since computers were invented, that era is over. You're spinning up AI agents, giving them tasks *in English* and managing and reviewing their work in parallel. The biggest prize is in figuring out how you can keep ascending the layers of abstraction to set up long-running orchestrator Claws with all of the right tools, memory and instructions that productively manage multiple parallel Code instances for you. The leverage achievable via top tier "agentic engineering" feels very high right now. It’s not perfect, it needs high-level direction, judgement, taste, oversight, iteration and hints and ideas. It works a lot better in some scenarios than others (e.g. especially for tasks that are well-specified and where you can verify/test functionality). The key is to build intuition to decompose the task just right to hand off the parts that work and help out around the edges. But imo, this is nowhere near "business as usual" time in software.

English
0
0
52
4K