刘聪NLP

60 posts

刘聪NLP

@logcong0120

US Katılım Şubat 2024

126 Takip Edilen77 Takipçiler

刘聪NLP@logcong0120·15 May

飞书开源CLI，没想到，40多天star就破万了。相较于其他办公的CLI来说，star速度很快，算是认可吧。毕竟飞书在AI时代，开放性不言而喻，AI浓度极高，用起来也是很方便。现在，国内办公套件的竞争，已经不仅是App体验之争、大模型能力之争了，还有谁能把自己的业务能力开放成Agent友好的基础设施。

中文

刘聪NLP@logcong0120·28 Mar

就在刚刚，4h前，飞书开源了CLI，也就是命令行工具。 Agent 可以直接终端中操作飞书了，共涉及消息、文档、多维表格、电子表格、日历、邮箱、任务、会议等场景，见图3，及 19 个 AI Agent Skills，见图4. 加速了，Agent AI办公应用落地~

中文

刘聪NLP retweetledi

Junyang Lin@JustinLin610·3 Mar

me stepping down. bye my beloved qwen.

English

1.7K

726

13.5K

6.6M

刘聪NLP@logcong0120·31 Ara

12月开源模型汇总，2026最期待哪个模型更新又到了一个月的最后一天，已经汇总，涉及DeepSeek、智谱了、小米、MiniMax、美团、阶跃、Qwen、通义、腾讯、阶跃等27个模型。 mp.weixin.qq.com/s/4Vzbk2u4jowc…

中文

175

刘聪NLP@logcong0120·9 Ara

@oran_ge 总是能突破极限

中文

587

Orange AI@oran_ge·9 Ara

又见证 banana 的极限了。。。如此详尽，中文字还完全没出错

中文

503

64.5K

刘聪NLP@logcong0120·31 Eki

10月开源模型汇总，但Qwen仍在发力，Qwen3-VL尺寸全开；混元稳居世界模型第一；蚂蚁连发Ling、Ring、Ming三款模型；美团转向多模态，进军Video；OCR成最大爆点，DeepSeek高立意、Paddle强落地；MiniMax、快手等也持续上新。 mp.weixin.qq.com/s/CtNq_ZLj_JE5…

中文

288

刘聪NLP@logcong0120·30 Eyl

整个9月，大模型开源社区依旧很卷，阿里开源Qwen3-Omni、Qwen3-Next、Qwen3-VL等模型；腾讯开源7个模型，二位现在在开源社区都是量产，哈哈哈~ 当然，还有美团LongCat-Thinking、快手Keye-VL1.5、面壁VoxCPM等等等等！最后两天DeepSeek-V3.2、GLM4.6也都出了， zhuanlan.zhihu.com/p/195640481989…

中文

491

刘聪NLP@logcong0120·16 Eyl

昨天可灵Kling-Avatar上线，AI数字人这一块有点意思。模型的整体结构如下图所示，核心有三部分，故事线生成模块-MLLM Director、蓝图视频生成模块、最终视频生成模块。实测，附上自己的认真的雪，再来一首聪别的Hey Kong。

中文

602

刘聪NLP@logcong0120·9 Eyl

cool

Junyang Lin@JustinLin610

github.com/huggingface/tr…

English

162

刘聪NLP@logcong0120·5 Eyl

整体感受， - 带/think深度思考，带/no_think直接回答，什么都不带是auto模式，自己判断 - Keye-VL1.5对于短视频的理解很不错，一些玩梗的视频可以理解 - OCR和图片理解也不错 - Grounding做了专门的优化，可以精准定位 - 因为模型只有8B大小，对于世界知识、空间逻辑还有空间变换还是存在一定的欠缺

刘聪NLP@logcong0120

快手开源Keye-VL1.5 模型结构还是经典的三件套，视觉编码器（ViT）、MLP映射层，大模型解码器（LLM）。对于视频，创新地提出了Slow-Fast 视频编码，就是把视频里关键动作用高清慢镜头细看，静止背景用流畅快镜头扫过，既省算力又不丢细节。 mp.weixin.qq.com/s/e3262cNNJPv4…

中文

109

刘聪NLP@logcong0120·5 Eyl

中文

213

刘聪NLP retweetledi

Zhihu Frontier@ZhihuFrontier·1 Eyl

📅 China's Open-Source LLM Boom in August — A detailed recap by Zhihu mind explorer @logcong0120 🔎 TL;DR: The open-source race in China is still intense — more players, more models, more action. Did you miss? 👇 • Aug 1 · XBai-o4 (32B) by @theMetaStoneAI: Based on Qwen3-32B, excels in complex reasoning, beats OpenAI-o3-mini. • Aug 4 · @TencentHunyuan released 4 small models (0.5B–7B) as Qwen3 competitors. • Aug 4 · @Alibaba_Qwen Qwen-Image: Text-to-image model with fine-grained layout + paragraph rendering. • Aug 4 · @Xiaomi MiDashengLM-7B: Audio LLM that outperforms Qwen2.5 & Kimi in audio understanding. • Aug 6 · @xiaohongshu dots.vlm1, combining NaViT visual encoder + DeepSeek V3 LLM. • Aug 7 · Qwen3-4B-Instruct & -Thinking (Dense models) • Aug 8 · @OpenBMB MiniCPM-V-4 (4B): Real-time video/image understanding on phones & PCs. • Aug 11 · Baichuan-M2-32B (Medical LLM) & GLM4.5-V @Zai_org , 106B MoE with "thinking mode") • Aug 12 · Lumina-mGPT 2.0 (Shanghai AI Lab): Decoder-only model for unified vision tasks. • Aug 12 · Kuaishou Klear-Reasoner-8B • Aug 13 · StepFun-Prover-32B (theorem-proving) • Aug 14 · @TencentHunyuan Hunyuan-GameCraft: Interactive game video generation from image + text + actions. • Aug 18 · @StepFun_ai NextStep-1: Includes a 14B LLM + image generation/editing model. • Aug 19 · @Alibaba_Qwen Qwen-Image-Edit (20B): Brings precision text rendering into image editing. • Aug 20 · @deepseek_ai DeepSeek-V3.1: Improved coding, slightly weaker on general text. • Aug 21 · ByteDance Seed-OSS (36B) • Aug 23 · @intern_lm Intern-S1-mini (8B): Strong for scientific tasks. • Aug 26 · @OpenBMB MiniCPM-V 4.5 (8B): High-frame-rate video understanding · @intern_lm InternVL 3.5 series: 9 models, Dense + MoE • Aug 26 · @Alibaba_Wan Wan2.2-S2V-14B: Text + image + audio → lifelike digital human video. • Aug 28 · HunyuanVideo-Foley: Auto sound effects for video · @BytedanceTalk USO: Style + subject controllable image generation • Aug 31 ·@Meituan_LongCat (560B MoE): Dynamic routing activates 18.6B–31.3B parameters per query. 💡 Missed any? Catch the full recap on Zhihu (CN): zhuanlan.zhihu.com/p/194578262172… #ChinaAI #LLM #OpenSource #Multimodal #AI

English

690

刘聪NLP@logcong0120·26 Ağu

InternVL3.5开源，从1B到241B 书生开源了InternVL-3.5 模型，共9个模型，Dense模型有1B、2B、4B、8B、14B、38B，MoE模型有InternVL3.5-20B-A4B、InternVL3.5-30B-A3B InternViT-300M、InternVL3.5-241B-A28B，见图2 测试效果，241B-A28B模型超过GLM4.5V，仅次于闭源的GPT5和Gemini2.5-pro，见图3

中文

585

刘聪NLP@logcong0120·26 Ağu

0.6B 这种小模型能干啥？今天早上来自网友的灵魂拷问？地铁时间，闲聊一下起因是有个群友想做一个工单意图分类，但是没有资源，问怎么办？我直接让它ollama cpu部署一个0.6的qwen3模型，类别不多的情况下，应该没有问题，然后就受到了其他人的灵魂拷问，现在0.6B模型还能干啥，一点都不智能

中文

186

刘聪NLP retweetledi

Adina Yakup@AdinaYakup·21 Ağu

Intern-S1-mini 🔥 lightweight open multimodal reasoning model by @intern_lm huggingface.co/internlm/Inter… ✨ Efficient 8B LLM + 0.3B vision encoder ✨ Apache 2.0 ✨ 5T multimodal pretraining, 50%+ in scientific domains ✨ Dynamic tokenizer for molecules & protein sequences

English

203

32.2K

刘聪NLP@logcong0120·21 Ağu

有中感觉，国内大模型，现在不开源，会有罪。字节也开源了Seed-OSS模型，36B，甜点尺寸，还有剔除融合数据的预训练模型 mp.weixin.qq.com/s/I823_ajeTG_s…

中文

135

刘聪NLP@logcong0120·20 Ağu

mp.weixin.qq.com/s/Np0m5YoZNMoz… 我的DeepSeek-V3.1实测，感觉并没用提高

中文

187

刘聪NLP@logcong0120·20 Ağu

字节也要开源了，36B要PK Qwen的32B吧，急需这个尺寸的模型

中文

刘聪NLP@logcong0120·14 Ağu

我想要这个拉布布，但是Grok给我画了个拉布拉多

中文

114

刘聪NLP retweetledi

Zhihu Frontier@ZhihuFrontier·1 Ağu

🚀 China's Open-Source AI Models Are Booming! In July alone, 🇨🇳 flooded @huggingface Trending with 9/10 top models - open-sourced by Chinese teams. Here's your lightning-fast recap compiled by Zhihu Mind Explorer @logcong0120 Now let’s break down the July drop: 📅 June 27 – @TencentHunyuan releases Hunyuan A13B: 80B total, 13B active params. Fills the 70-80B size gap. 📅 June 30 – @Baidu_Inc open-sources ERNIE 4.5: full-size LLMs + multimodal versions. 📅 July 1 – @Alibaba_Qwen drops ThinkSound, the first CoT audio model for frame-level video dubbing. 📅 July 2 – @Zai_org releases GLM-4.1V-Thinking (9B), a powerful vision-language model. 📅 July 4 – 昆仑万维 launches Skywork-Reward-V2, 8 different reward models (600M-80B). 📅 July 8 – @AntGroup open-sources KAG-Thinker, a deep reasoning model for multi-hop cognitive tasks. 📅 July 9 – 昆仑万维 again, with Skywork-R1V3, a multimodal model fine-tuned from InternVL-38B. 📅 July 11 – @Kimi_Moonshot Kimi-K2 gets 12K+ downloads in 20 minutes, with Base + Instruct models. 📅 July 12 – Zhihu unveils Zhi-Create, a creative writing model fine-tuned on Qwen3-32B. 📅 July 19 – @BytedanceTalk drops Seed-X, a multilingual translation series (7B) covering Instruct, RM, PPO. 📅 July 21-25 – @Alibaba_Qwen open-sources three giants: Qwen3-235B-A22B-Instruct, Qwen3-Coder-480B-A35B-Instruct and Qwen3-235B-A22B-Thinking. 📅 July 26 – Shanghai AI Lab unveils Intern-S1, a massive 241B multimodal reasoning model. 📅 July 27 – @TencentHunyuan drops HunyuanWorld-1, the first open 3D immersive, interactive, and simulated world generation model. Game dev, VR, content creation = transformed. 📅 July 28 – @Alibaba_Wan goes big with Wan2.2, the first MoE-based video generation foundation model. Includes: T2V (text-to-video), I2V (image-to-video), TI2V (unified video generation). 📅 July 28 – @Zai_org releases GLM-4.5 series: GLM-4.5 355B-A32B and GLM-4.5-Air 106B-A12B, shot straight to the top of HuggingFace. 📅 July 30 – @Alibaba_Qwen adds two “friendly size” releases: Qwen3-30B-A3B-Instruct and Qwen3-30B-A3B-Thinking 📅 July 30 – 昆仑万维 launches Skywork-UniPic-1.5B, a unified multimodal model for image understanding, generation, and editing. 📅 July 31 – @StepFun_ai open-sources Step 3, setting new benchmarks in multimodal reasoning efficiency. 🤔 Open-sourcing isn't just technical - it’s strategic, and China's making moves at a blistering pace. 📖 Full article on Zhihu: zhuanlan.zhihu.com/p/193419697965… #OpenSource #LLM #AI #ChinaAI

English

Keşfet

@oran_ge @theMetaStoneAI @TencentHunyuan @Alibaba_Qwen @Xiaomi @xiaohongshu @OpenBMB @Zai_org