ModelScope

598 posts

ModelScope banner
ModelScope

ModelScope

@ModelScope2022

Driving innovations with open communities.

HangZhou, China Bergabung Nisan 2024
91 Mengikuti6.6K Pengikut
Erika S
Erika S@E_FutureFan·
@ModelScope2022 I'm skeptical when small models claim SOTA, but beating Qwen3-VL-235B on OmniDocBench with just 3B? That's efficient scaling. How's it with degraded historical archives?
English
2
0
1
252
ModelScope
ModelScope@ModelScope2022·
🔥 Meet dots.ocr-1.5: 3B OCR model from Rednote-hilab , SOTA multilingual document parsing, virtually any writing system. 📊 Elo 1089 on olmOCR-Bench, 1157 on XDocParse — above GLM-OCR, and PaddleOCR-VL-1.5 📄 OmniDocBench text edit 0.031, beats Qwen3-VL-235B (0.069) and Gemini 2.5 Pro (0.075) 🎨 SVG code output for charts, diagrams, and chemical formulas 🌐 Web parsing, scene text spotting, and object counting included ⚡ vLLM supported, single GPU 🤖 Model: modelscope.cn/models/rednote… 🔗 GitHub: github.com/rednote-hilab/… 🎠 Demo: dotsocr.xiaohongshu.com
ModelScope tweet mediaModelScope tweet mediaModelScope tweet mediaModelScope tweet media
English
1
36
244
11.3K
kevin
kevin@tobeniceman·
@ModelScope2022 Not bad. Why didn‘t it compare with glm-ocr?
English
1
0
0
46
ModelScope
ModelScope@ModelScope2022·
Say hi to Qianfan-OCR: a 4B end-to-end document intelligence model, achieving SOTA among all end-to-end models on OmniDocBench v1.5 and OlmOCR Bench. 🏆 OmniDocBench v1.5: 93.12, beats DeepSeek-OCR-v2, Gemini-3 Pro 🏆 KIE average 87.9, above Gemini-3.1-Pro and Qwen3-VL-235B-A22B 🧠 Layout-as-Thought: reasoning mode via token for complex layout recovery 🌍 192 languages supported ⚡ 1.024 PPS on A100 with W8A8 quantization ✍️ Apache 2.0. vLLM ready. 🤖 Model: modelscope.cn/models/baidu-q… 📄 Paper: modelscope.ai/papers/2603.13…
ModelScope tweet media
English
1
17
92
5.5K
ModelScope
ModelScope@ModelScope2022·
dots.mocr from Rednote, a 3B multimodal OCR building on dots.ocr with stronger benchmarks and broader task coverage. 🚀 📊 Tops HuanyuanOCR, GLM-OCR, and PaddleOCR-VL-1.5 across olmOCR-Bench, OmniDocBench v1.5, and XDocParse with Elo average 1124.7 🎨 Charts, UI layouts, scientific figures parsed directly to SVG — dots.mocr-svg variant for dedicated image-to-SVG tasks 🌐 Web parsing, scene text spotting, document QA all included ⚡ Integrated into vLLM from v0.11.0 📄 Apache 2.0. Model: modelscope.cn/models/rednote… Model 🌍: modelscope.ai/organization/r… Paper: modelscope.ai/papers/2603.13… GitHub: github.com/rednote-hilab/…
ModelScope tweet mediaModelScope tweet mediaModelScope tweet mediaModelScope tweet media
English
0
1
6
511
ModelScope
ModelScope@ModelScope2022·
ModelScope Civision now supports FireRed-Image-Edit-1.1 🚀 Free image generation and training, ready to use. 👉 Give it a try: modelscope.cn/aigc
ModelScope tweet media
English
1
5
21
2K
ModelScope
ModelScope@ModelScope2022·
Step-3.5-Flash-SFT is open: the complete SFT training corpus, tokenizer snapshots, and pre-compiled StepTronOSS shards, all in one release. 📊Dataset: modelscope.cn/datasets/stepf… 🧑‍💻code: github.com/stepfun-ai/Ste… - Multi-turn conversation JSON with loss_mask and optional reasoning_content - Tokenizers for Step-3.5-Flash and Qwen3 included for chat template alignment - Pre-compiled shards: drop in and train, no preprocessing - Reference recipes for both Step-3.5-Flash and Qwen3 variants - Apache-2.0 + CC-BY-NC-2.0 🌉 Weights + training framework + SFT data. The full stack.
ModelScope@ModelScope2022

Step 3.5 Flash is now open source: model weights and full training framework (SteptronOSS), released together.🚀 196B total, 11B active. SWE-bench Verified 74.4% / Terminal-Bench 2.0 51.0%. - MoE architecture: 288 routed experts + 1 shared, Top-8 activation per token - MTP-3: predicts 4 tokens per forward pass, 100–300 tok/s typical, 350 tok/s peak - 3:1 SWA ratio (1 full attention + 3 sliding window layers): 256K context at lower compute cost - 💻 Runs on Mac Studio M4 Max and NVIDIA DGX Spark - SteptronOSS: SFT, continued pretraining, RL (WIP) - Apache 2.0 Two checkpoints released: Step-3.5-Flash-Base and Step-3.5-Flash-Base-Midtrain. 🤖 Base: modelscope.cn/models/stepfun… 🤖 Midtrain: modelscope.cn/models/stepfun… 🔧 Training Framework: github.com/stepfun-ai/Ste… 📄 Paper: modelscope.cn/papers/2602.10…

English
1
15
119
11.2K
ModelScope
ModelScope@ModelScope2022·
🚀 Skills Central is now live on ModelScope! 🎉 Dive in and explore the amazing Skills built by open community: 🔗 modelscope.cn/skills 🛠️ Comprehensive Coverage: Spanning dev tools, frontend, code quality, multimedia, mobile, and cloud tooling. ⚡ Immediate Integration: One-liner installation to OpenClaw, Cursor, Qoder and more! Or grab the ZIP file with just one click. 🔍 Easy Discovery: Discover the Skills you need in one place, with comprehensive bilingual (English and Chinese) documentation. 🔌 What's Next: OpenAPI access, ModelScope SDK integrations and more features are on the way! Together with our MCP Plaza, we hope the addition of Skills Central will facilitate better interactions between open models and the flourishing Tooling ecosystems. Come build with us!
ModelScope tweet media
English
0
6
34
2.6K
阿杰快跑CL
阿杰快跑CL@ajie_run_CL·
I trained a LoRA model based on Qwen-Edit-2509 and used the LoRA fine-tuning method to deeply adapt it to the context of traditional Chinese murals, with a focus on applications such as digital restoration of murals and cultural heritage preservation. By introducing cutting-edge AI technology into the field of traditional art restoration, the model not only enhances the accuracy of digital restoration for damaged murals but also helps advance the intelligent management and intergenerational preservation of cultural heritage, ensuring that these precious historical treasures are permanently preserved and widely accessible in digital form, Download link and more examples are in the comments section. @Ali_TongyiLab @Alibaba_Qwen @ModelScope2022 #HappyQwensday #QwenImageLoRA
阿杰快跑CL tweet media阿杰快跑CL tweet media阿杰快跑CL tweet media阿杰快跑CL tweet media
English
2
0
1
57
ModelScope
ModelScope@ModelScope2022·
Fun-CineForge is here! 🚀 Inference code and checkpoints just dropped. An end-to-end pipeline and multimodal LLM-based dubbing model built for diverse cinematic scenes. 🎬 Zero-shot dubbing across monologue, narration, dialogue, and multi-speaker scenes 🏗️ End-to-end dataset construction pipeline that generates large-scale annotated dubbing datasets from raw video 📦 CineDub-CN: first large-scale Chinese TV drama dubbing dataset with rich annotations and diverse scene types 🌐 English video support added with CineDub-EN samples now available 🏆 Outperforms SOTA on audio quality, lip sync, timbre conversion, and instruction following across all scene types 🔓 Pipeline toolkit, model weights, and inference code fully open Model: modelscope.cn/models/FunAudi… GitHub: github.com/FunAudioLLM/Fu… Demo: funcineforge.github.io
ModelScope tweet mediaModelScope tweet mediaModelScope tweet media
English
0
3
16
2.2K
ModelScope
ModelScope@ModelScope2022·
🎧 Fish Audio S2 Pro is open source: a 4B+400M Dual-AR TTS model with free-form inline prosody and emotion control, trained on 10M+ hours of audio across 80+ languages.💬 🏗️ Dual-AR architecture: 4B Slow AR for semantics + 400M Fast AR for 9 residual codebooks — quality without inference overhead 🎭 Inline control via free-form tags: [whisper], [laughing], [professional broadcast tone] — 15,000+ unique tags, word-level precision 🌐 80+ languages, Tier 1: Japanese, English, Chinese ⚡ SGLang-native: continuous batching, paged KV cache, RadixAttention prefix caching — all inherited from LLM serving stack 📊 RTF: 0.195 on H200, ~100ms time-to-first-audio, 3,000+ acoustic tokens/s 🔓 Weights + fine-tuning code + streaming inference engine all released 🌍 Model: modelscope.ai/models/fishaud… 🤖 Model: modelscope.cn/models/fishaud… 🔧 GitHub: github.com/fishaudio/fish…
ModelScope tweet media
English
1
8
102
5.2K
ModelScope
ModelScope@ModelScope2022·
14B faster than 1.3B. Helios is here 🚀 a 14B real-time long video generation model running at 19.5 FPS on a single H100, with native T2V, I2V, and V2V support. 🌟 The breakthroughs: - No anti-drifting heuristics: no self-forcing, no keyframe sampling — drift simulated during training instead - No standard acceleration: no KV-cache, no sparse/linear attention, no quantization - Compute cost matches 1.3B models via heavy context compression + reduced sampling steps - Four 14B models fit in 80GB during training, no parallelism framework required Outperforms prior methods on both short- and long-video benchmarks. Base + distilled model both released. 🤖 Models: modelscope.cn/collections/Be… 🌍 Models: modelscope.ai/profile/BestWi… 📄 Paper: modelscope.cn/papers/2603.04… 🔧 GitHub: github.com/PKU-YuanGroup/…
English
1
13
102
8K
ModelScope
ModelScope@ModelScope2022·
Meet Twinkle✨, our fully open-source implementation enabling Training via APIs! 🚀 With a clean modular Client-Server paradigm, you can implement your RL training in ~150 lines of code with Twinkle✨. Why you'll love it: 🏡 Multi-tenant: Train multiple LoRAs on ONE shared base model at the same time. ⚡️ Performant: Megatron & Transformers support for fast, stable training. 🛠️ Flexible: Drop in the Tinker API or use native Twinkle✨ APIs for finer-grained control. Built by the team behind ms-swift, with both Client AND Server implementations fully open-source. Run locally, clustered, or try our Serverless service hosted on ModelScope today! 🔗 github.com/modelscope/twi…
ModelScope tweet mediaModelScope tweet mediaModelScope tweet media
English
0
4
28
2.3K
ModelScope
ModelScope@ModelScope2022·
Style transfer with Qwen-Image-Edit-2511 + LoRA 🤩 Feed it any style reference and watch your artwork transform completely, color, mood, and atmosphere all carry over beautifully! Download the LoRA here👉modelscope.ai/models/daniel8…
大雄@dx8152

This time, we're showcasing the Qwen-Image-Edit-2511 model, a fun LoRA model for migrating everything, used in the LoRA training competition. Download link and more examples are in the comments section. @Ali_TongyiLab @Alibaba_Qwen @ModelScope2022 #HappyQwensday #QwenImageLoRA

English
2
6
34
2.8K