PaddlePaddle

814 posts

PaddlePaddle

@PaddlePaddle

The first independent R&D and Open-Source deep learning platform in China. Powering the ERNIE model family.

Shenzhen, China Katılım Eylül 2022

172 Takip Edilen9K Takipçiler

Sabitlenmiş Tweet

PaddlePaddle@PaddlePaddle·29 Oca

🚀PaddleOCR-VL-1.5 is here!— A New SOTA for Document Parsing! 🔍 94.5% accuracy on OmniDocBench v1.5 with just 0.9B parameters, outperforming leading general-purpose LLMs and document-specific models under standardized benchmarks. 📐 Irregular-shaped localization delivers robust parsing across real-world conditions—scans, skewed or warped pages, screen photos, and challenging lighting—achieving comprehensive SOTA results 🧾 Text spotting + seal recognition, both reaching new state-of-the-art performance, alongside major gains in table, formula, and text recognition. 🌍 Stronger multilingual & specialized support: rare characters, ancient texts, multilingual tables, underlines, checkboxes—now with extended coverage including Tibetan and Bengali. 📄 Built for long documents with automatic cross-page table merging and heading recognition, reducing fragmentation at scale. 🔥Smarter. More robust. Production-ready. PaddleOCR-VL-1.5 sets a new bar for real-world document intelligence. How to Use⬇️ Official Website/ API 👉 paddleocr.com Open-source Repository 👉github.com/PaddlePaddle/P… Model Download 👉 huggingface.co/PaddlePaddle/P… #PaddleOCR #OCR

English

101

708

94.5K

PaddlePaddle@PaddlePaddle·10m

@isstfix Thanks for sharing! Pls check DM.

English

blank@isstfix·17h

@PaddlePaddle Dm me I want to share urdu datasets with you because your model is useless for urdu documents

English

228

PaddlePaddle@PaddlePaddle·18h

🚀 Big Upgrade: PaddleOCR Website Just Got a Major Boost! More pages. Faster parsing. Better batch workflows. The latest PaddleOCR website update is built for real-world document workloads — from long PDFs to high-volume processing. What’s new 📄 10,000 free pages/day for individual users ⏱️ New async parsing service for long documents and heavy jobs 📚 Up to 1,000 pages per file — no more splitting large PDFs ⚙️ Stronger concurrency & batch processing with a major backend upgrade Why it matters With async service and higher throughput, PaddleOCR now handles long and large-scale document parsing far more efficiently — making enterprise-grade OCR workflows easier to access, test, and scale. 🌐 Try it now: paddleocr.com 💬 Feedback: paddleocr@baidu.com 🔧 GitHub: github.com/PaddlePaddle/P… And with PaddleOCR Skills already live on ClawHub, your OpenClaw workflows can now process documents even faster and better.💪 #PaddleOCR #OCR #DocumentAI #OpenSource #ClawHub

English

143

7.1K

PaddlePaddle@PaddlePaddle·11m

@EditorEnBici Absolutely — welcome to test, build, and share your feedback with us!

English

CEO del Socialismo de Mercado🌹🕊️ 市场社会主义CEO@EditorEnBici·16h

@PaddlePaddle Works in Spanish and for children's textbooks with illustrations and diagrams that will be on the test?

English

118

PaddlePaddle@PaddlePaddle·13m

🤯 Wow! Welcome everyone to try it out, put it into real workflows, and share your feedback with us! And if you’re building with OpenClaw, don’t miss two official PaddlePaddle Skills we highly recommend: 🔹 「paddleocr-doc-parsing 」 advanced document parsing with PaddleOCR, returning full document structure including text, tables, formulas, charts, and layout information 🔗 clawhub.ai/Bobholamovic/p… 🔹 「paddleocr-text-recognition 」 structured text extraction from images, PDFs, and documents, with support for both URLs and local files, returning OCR results in structured JSON 🔗 clawhub.ai/Bobholamovic/p… Give them a try and let us know what you build 🚀

Baidu Inc.@Baidu_Inc

As our @openclaw lineup continues to grow, it now covers multiple products and an expanding set of ClawHub skills — helping support more real-world tasks across desktop, mobile, browser, home, search, productivity, and content workflows!

English

PaddlePaddle retweetledi

ERNIE for Developers@ErnieforDevs·2d

Excited to share that ERNIE 5.0 is now integrated with @eigent_ai — The Best Open Source Cowork Desktop built for multi-agent collaboration. Just set ERNIE 5.0 as default, turn on Docx Skill, and let Eigent handle the entire pipeline — research, write, deliver. Done. Try it out and see how much faster your workflow can be 👇 🔗 eigent.ai 🔗 github.com/eigent-ai/eige… Get free ERNIE 5.0 API tokens 👇 🔗aistudio.baidu.com/account/access…

English

2.9K

PaddlePaddle retweetledi

karminski-牙医@karminski3·6d

我做了个闪念龙虾! 不得不说老罗给了我灵感, 我一直有一个 AI 读书笔记APP的点子, 就是拍照然后圈出来想记笔记的部分, AI就能自动帮我记读书笔记. 然后今天我用龙虾+PaddleOCR 尝试了一下, 5分钟就实现了 (所以说完全没必要写个APP了, 龙虾真的绝了不少程序员创业三件套的命根...) 我现在直接手机相机拍照, 然后像图中一样发送给龙虾, 他就自动帮我记笔记了, 甚至模型足够强, 如果看的是比较著名的段落, 它还能自动帮你标记是哪本书(我到现在都不知道它是怎么知道我拍的那段是狼与香辛料的...). 用 PaddleOCR 识别拍书的理由也很简单, clawhub 上有现成的 skill, 不用自己造轮子, 以及它识别率超高, 我之前给大家做过测试, 哪怕把纸揉成一团, 只要能展开, 就能识别. 所以这玩意是成功率的关键. 我就拿了个文库本在弱光环境下测了一下, 完美识别. #文心 #文心大模型 #PaddleOCR #飞浆 #龙虾 #OpenClaw

中文

105

13.2K

PaddlePaddle@PaddlePaddle·9 Mar

🔥 PaddleOCR-VL is now available in the llama.cpp ecosystem This brings document parsing VLMs closer to local and lightweight deployment workflows — making it easier for developers to explore portable, community-friendly multimodal document AI. Why it matters 🔹 Easier access to PaddleOCR-VL in GGUF-based workflows 🔹 More flexible paths for local inference and lightweight deployment 🔹 A simpler way to experiment with structured document understanding beyond traditional OCR stacks ⚠️ Important note for developers If llama-cli works but llama-server throws errors, try explicitly passing --chat-template-file when launching llama-server. Chat template / GGUF resources 👉 PaddleOCR-VL-1.5-GGUF: modelscope.cn/models/megemin… 👉 PaddleOCR-VL-GGUF: modelscope.cn/models/megemin… A meaningful step toward making document intelligence more open, more portable, and more developer-ready. #PaddleOCR #llamacpp #GGUF #DocumentAI #MultimodalAI #OCR #OpenSourceAI

English

4.8K

PaddlePaddle@PaddlePaddle·7 Mar

🚀 FlashMaskV4: Leveling up with FlashAttention-4 With @tri_dao officially releasing the FlashAttention-4 paper, we are thrilled to announce the continued evolution of FlashMaskV4! Building on our research FlashMask (arxiv.org/abs/2410.01359), we’ve integrated FA4’s core power to deliver the ultimate sweet spot between masking flexibility and hardware-limit throughput. 🔥 Why FlashMaskV4? 🔹FA4 Powered: Fully leverages the latest FlashAttention-4 kernels for next-gen efficiency. 🔹Column-wise Sparse Masking: Optimized support for diverse masks (Prefix LM Document, Share Question, etc.) across both FWD and BWD. 🔹Massive Speedups: Up to 2.9x faster in FWD and 1.6x in total compared to base FA4 mask_mod (8k seq). 🔹Long-Context Mastery: Maintains high efficiency and stability from 8k to 128k sequence lengths. No more choosing between custom attention logic and peak FLOPS. ⚡️ Explore the code & benchmarks: 🔗 github.com/PaddlePaddle/f… #FlashAttention4 #FlashMaskV4 #MachineLearning #OpenSourceAI #LLM #PaddlePaddle

English

137

7.5K

PaddlePaddle@PaddlePaddle·6 Mar

🚀 RAGFlow × PaddleOCR-VL-1.5 — a powerful new integration for document RAG PaddleOCR-VL-1.5 is now integrated into RAGFlow’s DeepDoc Parser, bringing stronger document understanding to the very first step of the RAG pipeline. Why it stands out 🔹 Better parsing for scans, photos, distortion, and complex layouts 🔹 Polygon-level localization for more precise element detection 🔹 Cross-page table merging + heading continuity for long documents 🔹 Visual citation grounding for more traceable and trustworthy retrieval From messy PDFs to structured, citation-ready knowledge — now built directly into RAGFlow. Learn more 👉PaddleOCR-VL-1.5: github.com/PaddlePaddle/P… 👉RAGFlow: github.com/infiniflow/rag… 👉Quick start: ragflow.io #RAGFlow #PaddleOCR #RAG #DocumentAI

English

119

24.9K

PaddlePaddle retweetledi

Baidu Inc.@Baidu_Inc·3 Mar

What differentiates China's approach to AI? Last November, @TIME's @CharlieCamp6ell attended our annual Baidu World conference in Beijing and spoke with our CEO Robin Li. China, Robin said, places greater emphasis on applications. He described AI development not as a singular pursuit of AGI, but as a pyramid built on foundational layers of chips and models, with applications at the top. Much of the value generated today remains concentrated at the base. To sustain investment across the ecosystem, greater value must be realized at the application layer. Watch the interview that just came out here: youtu.be/xvSEw8AqPtA?si…

YouTube

English

6.6K

PaddlePaddle retweetledi

Arena.ai@arena·24 Şub

📈Arena Trends Update We pulled Arena scores for the Top 10 labs in Text for the past 6 months (Sept-2025-Feb 2026), and the competitive spread is shifting again. With tighter confidence intervals and new entries in the mix, the frontier continues to shift. Stay tuned for more insights as we dive deeper into the top open models for February later this week. Let us know what you found the most surprising in the comments. 👇

English

251

23.3K

PaddlePaddle retweetledi

Baidu Inc.@Baidu_Inc·16 Şub

x.com/i/article/2023…

ZXX

65.9K

PaddlePaddle retweetledi

ERNIE for Developers@ErnieforDevs·10 Şub

Take a closer, technical look at our recent release of ERNIE 5.0: a 2.4 trillion-parameter unified multimodal foundation model. Key Highlights: 🔹2.4 Trillion Parameters: A massive-scale foundational model built on a unified autoregressive backbone. 🔹Unified Objective: We map all modalities to a shared token space and optimize them end-to-end using a unified Next-Group-of-Tokens Prediction. 🔹Omni-Capability: By effectively dissolving modality barriers, the model achieves seamless multimodal understanding and generation. Read the full Technical Report here: ernie.baidu.com/blog/posts/ern…

English

128

10.5K

PaddlePaddle@PaddlePaddle·11 Şub

Shoutout to our Day-0 integration partner — @Haystack_AI ! 🎉 Welcome to give it a try 👉 haystack.deepset.ai/integrations/p…

Haystack@Haystack_AI

📃@PaddlePaddle released a new VLM: PaddleOCR-VL-1.5 (0.9B) and you can already use it in your document-heavy Haystack pipelines. PaddleOCR-VL-1.5 goes beyond classic OCR: it understands document layout and structure, extracting tables, formulas, charts, and key elements from messy real-world PDFs and images. This makes it a powerful building block for reliable RAG pipelines and document-centric AI applications. For teams building AI systems for reasoning over complex documents, this enables more accurate retrieval, grounding, and reasoning across document structure. 🔍 Why it’s exciting: - 94.5% accuracy on OmniDocBench v1.5 - Irregular-shaped localization for real-world documents (skew, warp, photos) - Strong improvements in table, formula, and text spotting - Multilingual support, including rare scripts and complex layouts 🔗 Model: huggingface.co/PaddlePaddle/P… 🔗 Docs: haystack.deepset.ai/integrations/p…

English

865

PaddlePaddle@PaddlePaddle·10 Şub

📚 Read the report here: arxiv.org/pdf/2602.04705

Baidu Inc.@Baidu_Inc

📘 The ERNIE 5.0 technical report is out! Inside, we unpack how our model was built, covering architecture, pre-training, post-training, and infrastructure, such as: > ultra-sparse MoE architecture with modality-agnostic expert routing > unified multimodal training from scratch to avoid the "ability seesaw" > a novel elastic training paradigm for efficient scaling Check out the thread for the full report ↓

English

932

PaddlePaddle@PaddlePaddle·10 Şub

🚨 Big Drop: PaddleOCR Skill lands on @openclaw! Today, we’re excited to announce that the PaddleOCR Document Parsing Skill is now live on ClawHub, ready to plug directly into OpenClaw workflows. Instead of deploying OCR services or wiring APIs, developers can now invoke PaddleOCR as a standardized composable Skill node — embedding document understanding directly into Agents and automation pipelines. Built on PaddleOCR-VL-1.5, the Skill delivers ✅ Multi-format parsing (PDF, JPG, PNG, BMP, TIFF) ✅ Layout analysis — text, tables, formulas, headers ✅ 110+ language coverage ✅ Structured Markdown output preserving hierarchy ⛽️ No deployment. No wrappers. Just configuration — and build your document intelligence chain inside OpenClaw. 👉 Try the Skill: clawhub.ai/Bobholamovic/p… 👉 Explore more: github.com/PaddlePaddle/P… #OpenClaw #PaddleOCR #AgentTools #AIInfra #OpenSource #DocumentAI

English

156

10.8K

PaddlePaddle@PaddlePaddle·10 Şub

👏 PaddleOCR-VL-1.5 (Baidu)

Adina Yakup@AdinaYakup

A bit late due to the flu 😅 but still very worth sharing: China open source highlights for January 2026🔥 huggingface.co/collections/zh… ✨ Qwen released 4 new series, completing key agent primitives (I/O, memory, alignment): - Qwen3-TTS - Qwen3-ASR - Qwen3-VL-Reranker - Qwen3-VL-Embedding ✨ Ant Group is clearly moving into robotics & embodied AI, releasing the LingBot series: - LingBot-VA - LingBot-World - LingBot-VLA - LingBot-Depth ✨ Strong OCR models continue to stand out: - DeepSeek-OCR-2 - PaddleOCR-VL-1.5 (Baidu) ✨ 100% trained on local chips: - GLM-Image from Z.ai - TeleChat-36B-Thinking ✨ Vision + multimodality is becoming clearly product-oriented: - Unipic3 / SkyReels-V3-R2V from Skywork - UniVideo from Kling - Step3-VL from StepFun - GLM-Image - Z-Image from Tongyi / Alibaba - HunyuanImage-3.0 from Tencent ✨ Small but practical models: - Youtu (2B) / WeDLM (8B) from Tencent - AgentCPM from OpenBMB ✨ Large-scale LLMs: - Kimi-K2.5 (171B, Moonshot) - LongCat-Flash-Thinking (562B, Meituan) January already brought many exciting surprises, looking forward to what February will bring 👀

Lietuvių

2.7K

PaddlePaddle@PaddlePaddle·6 Şub

🚀 Big release! PaddleFormers v1.0 — Reimagining LLM Training Efficiency Introducing PaddleFormers v1.0, a heavyweight launch delivering a full-stack large model training toolkit built on PaddlePaddle — engineered to streamline adaptation, boost performance, and enable deployment across diverse hardware environments. Why it matters 👇 🔹 End-to-end training suite — From pre-training to post-training for LLMs & VLMs, supporting training across 100+ mainstream large language and vision-language models, including ERNIE-4.5, ERNIE-4.5-VL, DeepSeek-V3, the GLM-4.5 series, Qwen2/3 series, Qwen3-VL, as well as comprehensive training capability for the document-focused PaddleOCR-VL model. 🔹 High-performance training — FP8 optimization, hybrid parallelism, compute-comm overlap, and memory balancing deliver industry-leading efficiency — outperforming Megatron-LM on key models. 🔹 Developer-first workflow — Transformers-style APIs, YAML+CLI configs, and one-line training launch dramatically reduce engineering overhead. 🔹 Full-stack ecosystem readiness — Native compatibility with mainstream weight formats and inference stacks for seamless train-to-deploy pipelines. 🔹 Hardware flexibility — Deep support for domestic compute platforms alongside heterogeneous environments, expanding real deployment options. ⛽️ PaddleFormers v1.0 empowers teams to train, optimize, and productionize large models faster — turning scalable AI development into an accessible engineering workflow. Explore the release 👉 github.com/PaddlePaddle/P…

English

116

8.5K

PaddlePaddle@PaddlePaddle·4 Şub

🏆 Honored and grateful! Huge thanks to @huggingface for the recognition and this golden award celebrating our 1K+ followers milestone.🙏 Appreciate the continued support from the open-source ecosystem—excited to keep building and growing together! #Baidu #AI #HuggingFace

English

1.2K

PaddlePaddle@PaddlePaddle·3 Şub

🎉 70k+ GitHub Stars — and climbing! Today, PaddleOCR has officially surpassed 70K stars on @github @GitHubCommunity . A milestone powered by a global developer community and proven in real-world, production-scale OCR. ⭐ Star the repo and join us — Thank you for building, shipping, and pushing the limits with us! 🚀 Link 👉 github.com/PaddlePaddle/P… #PaddleOCR #GitHub #OCR

English

2.3K

Keşfet

@isstfix @EditorEnBici @eigent_ai @tri_dao @TIME @CharlieCamp6ell @Haystack_AI @openclaw