刘聪NLP

60 posts

刘聪NLP

刘聪NLP

@logcong0120

US Katılım Şubat 2024
126 Takip Edilen77 Takipçiler
刘聪NLP
刘聪NLP@logcong0120·
飞书开源CLI,没想到,40多天star就破万了。 相较于其他办公的CLI来说,star速度很快,算是认可吧。毕竟飞书在AI时代,开放性不言而喻,AI浓度极高,用起来也是很方便。 现在,国内办公套件的竞争,已经不仅是App体验之争、大模型能力之争了,还有谁能把自己的业务能力开放成Agent友好的基础设施。
刘聪NLP tweet media
中文
0
0
0
62
刘聪NLP
刘聪NLP@logcong0120·
就在刚刚,4h前,飞书开源了CLI, 也就是命令行工具。 Agent 可以直接终端中操作飞书了, 共涉及消息、文档、多维表格、电子表格、日历、邮箱、任务、会议等场景,见图3, 及 19 个 AI Agent Skills,见图4. 加速了,Agent AI办公应用落地~
刘聪NLP tweet media刘聪NLP tweet media刘聪NLP tweet media刘聪NLP tweet media
中文
0
0
0
84
刘聪NLP retweetledi
Junyang Lin
Junyang Lin@JustinLin610·
me stepping down. bye my beloved qwen.
English
1.7K
726
13.5K
6.6M
刘聪NLP
刘聪NLP@logcong0120·
12月开源模型汇总,2026最期待哪个模型更新 又到了一个月的最后一天,已经汇总,涉及DeepSeek、智谱了、小米、MiniMax、美团、阶跃、Qwen、通义、腾讯、阶跃等27个模型。 mp.weixin.qq.com/s/4Vzbk2u4jowc…
刘聪NLP tweet media
中文
0
0
2
175
Orange AI
Orange AI@oran_ge·
又见证 banana 的极限了。。。 如此详尽,中文字还完全没出错
Orange AI tweet media
中文
32
63
503
64.5K
刘聪NLP
刘聪NLP@logcong0120·
10月开源模型汇总,但Qwen仍在发力,Qwen3-VL尺寸全开;混元稳居世界模型第一;蚂蚁连发Ling、Ring、Ming三款模型;美团转向多模态,进军Video;OCR成最大爆点,DeepSeek高立意、Paddle强落地;MiniMax、快手等也持续上新。 mp.weixin.qq.com/s/CtNq_ZLj_JE5…
中文
0
0
1
288
刘聪NLP
刘聪NLP@logcong0120·
整个9月,大模型开源社区依旧很卷,阿里开源Qwen3-Omni、Qwen3-Next、Qwen3-VL等模型;腾讯开源7个模型,二位现在在开源社区都是量产,哈哈哈~ 当然,还有美团LongCat-Thinking、快手Keye-VL1.5、面壁VoxCPM等等等等! 最后两天DeepSeek-V3.2、GLM4.6也都出了, zhuanlan.zhihu.com/p/195640481989…
中文
0
1
2
491
刘聪NLP
刘聪NLP@logcong0120·
昨天可灵Kling-Avatar上线,AI数字人这一块有点意思。 模型的整体结构如下图所示,核心有三部分,故事线生成模块-MLLM Director、蓝图视频生成模块、最终视频生成模块。 实测,附上自己的认真的雪,再来一首聪别的Hey Kong。
刘聪NLP tweet media
中文
0
0
1
602
刘聪NLP
刘聪NLP@logcong0120·
整体感受, - 带/think深度思考,带/no_think直接回答,什么都不带是auto模式,自己判断 - Keye-VL1.5对于短视频的理解很不错,一些玩梗的视频可以理解 - OCR和图片理解也不错 - Grounding做了专门的优化,可以精准定位 - 因为模型只有8B大小,对于世界知识、空间逻辑还有空间变换还是存在一定的欠缺
刘聪NLP@logcong0120

快手开源Keye-VL1.5 模型结构还是经典的三件套,视觉编码器(ViT)、MLP映射层,大模型解码器(LLM)。 对于视频,创新地提出了Slow-Fast 视频编码,就是把视频里关键动作用高清慢镜头细看,静止背景用流畅快镜头扫过,既省算力又不丢细节。 mp.weixin.qq.com/s/e3262cNNJPv4…

中文
0
0
0
109
刘聪NLP
刘聪NLP@logcong0120·
快手开源Keye-VL1.5 模型结构还是经典的三件套,视觉编码器(ViT)、MLP映射层,大模型解码器(LLM)。 对于视频,创新地提出了Slow-Fast 视频编码,就是把视频里关键动作用高清慢镜头细看,静止背景用流畅快镜头扫过,既省算力又不丢细节。 mp.weixin.qq.com/s/e3262cNNJPv4…
中文
0
0
2
213
刘聪NLP retweetledi
Zhihu Frontier
Zhihu Frontier@ZhihuFrontier·
📅 China's Open-Source LLM Boom in August — A detailed recap by Zhihu mind explorer @logcong0120 🔎 TL;DR: The open-source race in China is still intense — more players, more models, more action. Did you miss? 👇 • Aug 1 · XBai-o4 (32B) by @theMetaStoneAI: Based on Qwen3-32B, excels in complex reasoning, beats OpenAI-o3-mini. • Aug 4 · @TencentHunyuan released 4 small models (0.5B–7B) as Qwen3 competitors. • Aug 4 · @Alibaba_Qwen Qwen-Image: Text-to-image model with fine-grained layout + paragraph rendering. • Aug 4 · @Xiaomi MiDashengLM-7B: Audio LLM that outperforms Qwen2.5 & Kimi in audio understanding. • Aug 6 · @xiaohongshu dots.vlm1, combining NaViT visual encoder + DeepSeek V3 LLM. • Aug 7 · Qwen3-4B-Instruct & -Thinking (Dense models) • Aug 8 · @OpenBMB MiniCPM-V-4 (4B): Real-time video/image understanding on phones & PCs. • Aug 11 · Baichuan-M2-32B (Medical LLM) & GLM4.5-V @Zai_org , 106B MoE with "thinking mode") • Aug 12 · Lumina-mGPT 2.0 (Shanghai AI Lab): Decoder-only model for unified vision tasks. • Aug 12 · Kuaishou Klear-Reasoner-8B • Aug 13 · StepFun-Prover-32B (theorem-proving) • Aug 14 · @TencentHunyuan Hunyuan-GameCraft: Interactive game video generation from image + text + actions. • Aug 18 · @StepFun_ai NextStep-1: Includes a 14B LLM + image generation/editing model. • Aug 19 · @Alibaba_Qwen Qwen-Image-Edit (20B): Brings precision text rendering into image editing. • Aug 20 · @deepseek_ai DeepSeek-V3.1: Improved coding, slightly weaker on general text. • Aug 21 · ByteDance Seed-OSS (36B) • Aug 23 · @intern_lm Intern-S1-mini (8B): Strong for scientific tasks. • Aug 26 · @OpenBMB MiniCPM-V 4.5 (8B): High-frame-rate video understanding · @intern_lm InternVL 3.5 series: 9 models, Dense + MoE • Aug 26 · @Alibaba_Wan Wan2.2-S2V-14B: Text + image + audio → lifelike digital human video. • Aug 28 · HunyuanVideo-Foley: Auto sound effects for video · @BytedanceTalk USO: Style + subject controllable image generation • Aug 31 ·@Meituan_LongCat (560B MoE): Dynamic routing activates 18.6B–31.3B parameters per query. 💡 Missed any? Catch the full recap on Zhihu (CN): zhuanlan.zhihu.com/p/194578262172… #ChinaAI #LLM #OpenSource #Multimodal #AI
Zhihu Frontier tweet media
English
0
6
15
690
刘聪NLP
刘聪NLP@logcong0120·
InternVL3.5开源,从1B到241B 书生开源了InternVL-3.5 模型,共9个模型,Dense模型有1B、2B、4B、8B、14B、38B,MoE模型有InternVL3.5-20B-A4B、InternVL3.5-30B-A3B InternViT-300M、InternVL3.5-241B-A28B,见图2 测试效果,241B-A28B模型超过GLM4.5V,仅次于闭源的GPT5和Gemini2.5-pro,见图3
刘聪NLP tweet media刘聪NLP tweet media刘聪NLP tweet media刘聪NLP tweet media
中文
0
0
2
585
刘聪NLP
刘聪NLP@logcong0120·
0.6B 这种小模型能干啥? 今天早上来自网友的灵魂拷问? 地铁时间,闲聊一下 起因是有个群友想做一个工单意图分类,但是没有资源,问怎么办? 我直接让它ollama cpu部署一个0.6的qwen3模型,类别不多的情况下,应该没有问题, 然后就受到了其他人的灵魂拷问,现在0.6B模型还能干啥,一点都不智能
刘聪NLP tweet media刘聪NLP tweet media刘聪NLP tweet media
中文
0
0
1
186
刘聪NLP retweetledi
Adina Yakup
Adina Yakup@AdinaYakup·
Intern-S1-mini 🔥 lightweight open multimodal reasoning model by @intern_lm huggingface.co/internlm/Inter… ✨ Efficient 8B LLM + 0.3B vision encoder ✨ Apache 2.0 ✨ 5T multimodal pretraining, 50%+ in scientific domains ✨ Dynamic tokenizer for molecules & protein sequences
English
4
28
203
32.2K
刘聪NLP
刘聪NLP@logcong0120·
有中感觉,国内大模型,现在不开源,会有罪。 字节也开源了Seed-OSS模型,36B,甜点尺寸,还有剔除融合数据的预训练模型 mp.weixin.qq.com/s/I823_ajeTG_s…
中文
0
0
1
135
刘聪NLP
刘聪NLP@logcong0120·
字节也要开源了,36B要PK Qwen的32B吧,急需这个尺寸的模型
刘聪NLP tweet media
中文
0
0
0
63
刘聪NLP
刘聪NLP@logcong0120·
我想要这个拉布布,但是Grok给我画了个拉布拉多
刘聪NLP tweet media刘聪NLP tweet media
中文
0
0
0
114
刘聪NLP retweetledi
Zhihu Frontier
Zhihu Frontier@ZhihuFrontier·
🚀 China's Open-Source AI Models Are Booming! In July alone, 🇨🇳 flooded @huggingface Trending with 9/10 top models - open-sourced by Chinese teams. Here's your lightning-fast recap compiled by Zhihu Mind Explorer @logcong0120 Now let’s break down the July drop: 📅 June 27 – @TencentHunyuan releases Hunyuan A13B: 80B total, 13B active params. Fills the 70-80B size gap. 📅 June 30 – @Baidu_Inc open-sources ERNIE 4.5: full-size LLMs + multimodal versions. 📅 July 1 – @Alibaba_Qwen drops ThinkSound, the first CoT audio model for frame-level video dubbing. 📅 July 2 – @Zai_org releases GLM-4.1V-Thinking (9B), a powerful vision-language model. 📅 July 4 – 昆仑万维 launches Skywork-Reward-V2, 8 different reward models (600M-80B). 📅 July 8 – @AntGroup open-sources KAG-Thinker, a deep reasoning model for multi-hop cognitive tasks. 📅 July 9 – 昆仑万维 again, with Skywork-R1V3, a multimodal model fine-tuned from InternVL-38B. 📅 July 11 – @Kimi_Moonshot Kimi-K2 gets 12K+ downloads in 20 minutes, with Base + Instruct models. 📅 July 12 – Zhihu unveils Zhi-Create, a creative writing model fine-tuned on Qwen3-32B. 📅 July 19 – @BytedanceTalk drops Seed-X, a multilingual translation series (7B) covering Instruct, RM, PPO. 📅 July 21-25 – @Alibaba_Qwen open-sources three giants: Qwen3-235B-A22B-Instruct, Qwen3-Coder-480B-A35B-Instruct and Qwen3-235B-A22B-Thinking. 📅 July 26 – Shanghai AI Lab unveils Intern-S1, a massive 241B multimodal reasoning model. 📅 July 27 – @TencentHunyuan drops HunyuanWorld-1, the first open 3D immersive, interactive, and simulated world generation model. Game dev, VR, content creation = transformed. 📅 July 28 – @Alibaba_Wan goes big with Wan2.2, the first MoE-based video generation foundation model. Includes: T2V (text-to-video), I2V (image-to-video), TI2V (unified video generation). 📅 July 28 – @Zai_org releases GLM-4.5 series: GLM-4.5 355B-A32B and GLM-4.5-Air 106B-A12B, shot straight to the top of HuggingFace. 📅 July 30 – @Alibaba_Qwen adds two “friendly size” releases: Qwen3-30B-A3B-Instruct and Qwen3-30B-A3B-Thinking 📅 July 30 – 昆仑万维 launches Skywork-UniPic-1.5B, a unified multimodal model for image understanding, generation, and editing. 📅 July 31 – @StepFun_ai open-sources Step 3, setting new benchmarks in multimodal reasoning efficiency. 🤔 Open-sourcing isn't just technical - it’s strategic, and China's making moves at a blistering pace. 📖 Full article on Zhihu: zhuanlan.zhihu.com/p/193419697965… #OpenSource #LLM #AI #ChinaAI
Zhihu Frontier tweet mediaZhihu Frontier tweet media
English
1
2
9
1K