Knowledgator

594 posts

Knowledgator

@knowledgator

Open-source ML research company focused on information extraction #ExplainableAI #AI #opensource #InformationExtraction #UnstructuredData #NLP

Katılım Ekim 2023

96 Takip Edilen396 Takipçiler

Knowledgator retweetledi

Ihor Stepanov@ihor_step·20 Mar

GLiNER just hit a new record 🚀 We just released a new version of the package, and GLiNER reached almost 30k downloads per day. I couldn't be prouder of this milestone and the community behind it. This release is packed with meaningful improvements: Performance wins ▪️ Vectorized CPU-path preprocessing and decoding hot loops ▪️ Vectorized relation decoding, replacing costly .item() calls with torch.where ▪️ Batch-level span decoding — reducing CUDA kernel launches from B*8 to ~8 Architecture & quality: ▪️ Restructured repo and updated documentation ▪️ Improved GLiNER-relex architecture and decoding A huge thank you to @MaxWBuckley for inference optimizations. Vivek for building an evaluation pipeline for new relation extraction models, Bryan Bradfo for improving documentation, and @urchadeDS for his review. Every single contribution matters. These improvements unlocked something bigger; we built RetriCo, a super-efficient GraphRAG system powered by the new GLiNER-relex models. For us at @knowledgator , open source is how we build technologies. It's how we collaborate with developers worldwide. It's what makes our work meaningful. We believe open information extraction technologies are essential for decentralizing power in AI. The tools for understanding and structuring the world's knowledge shouldn't be locked behind closed doors. Come build with us: 🔗 GLiNER: github.com/urchade/GLiNER 🔗 RetriCo: github.com/Knowledgator/R… 🔗 GLiNER Community: discord.gg/ZGsXs9GY

English

600

Knowledgator retweetledi

𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8·20 Şub

Building on GLiClass-V3, GLiClass-Instruct extends the encoder-first backbone with instruction conditioning for real zero-shot workflows. The architecture remains the same, but adds hierarchical labels, few-shot examples, label descriptions, and task prompts to control behavior without fine-tuning. Latency stays stable as taxonomies scale, and V3 maintains ~0.72 average zero-shot F1 (large) while preserving the throughput advantages of the underlying design, positioning it as infrastructure for large-taxonomy classification and guardrail workloads.

𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8

GLiClass-V3: A family of encoder-only models that match or exceed DeBERTa-v3-Large in zero-shot accuracy, while delivering up to 50× faster inference. Core Design: - Single-pass inference: No cross-encoder pairing needed. One forward pass handles all labels. - LoRA adapters: Fine-tuned on logic tasks (e.g., Formal Logic Reasoning, Commonsense QA) for symbolic generalization without catastrophic forgetting. - Edge-ready: gliclass-edge-v3.0 hits 97 ex/s on A6000, ideal for mobile and IoT GLiClass-V3 variants (gliclass-*): (Built on DeBERTa, ModernBERT, and Ettin for edge deployment) - large-v3.0: 70.0% avg F1 (best) - base-v3.0: 65.6% - modern-large: 60.8% - edge-v3.0: 48.7% (fastest, Ettin-based) - x-base: 57% F1 (EN), 42% (multilingual) for robust multilingual zero-shot generalization Benchmarks (zero-shot, no fine-tuning): - CR, SST2, IMDb: ~0.93–0.94 F1 - Outperforms GLiClass-v2 and cross-encoders (e.g., DeBERTa-v3-Large, RoBERTa) - Scales to 128+ labels with massive speedup (DeBERTa-Large: 0.25 ex/s vs GLiClass: 82.6) Use cases: - Multi-label classification (e.g., topic, sentiment, spam) - RAG reranking - Privacy-safe on-device NLP Built on DeBERTa and ModernBERT. Fully open-source. pip install gliclass

English

1.5K

Knowledgator@knowledgator·18 Şub

🧠 One lightweight model to classify, verify, and guard — introducing GLiClass-Instruct GLiClass started as a fast zero-shot topic classifier rivaling cross-encoders at a fraction of the cost. But topic classification wasn’t enough. Today we’re launching GLiClass-Instruct: a multitask model that can handle various tasks via sequence classification. ✨ What’s new • Hierarchical labeling for complex taxonomies • Few-shot in-context learning (no fine-tuning) • Natural language prompting for task control • EWC to add skills without forgetting old ones • 3× faster inference with FlashDeBERTa 🚀 New multi-task capabilities Beyond topic + sentiment classification, GLiClass-Instruct now supports: • Hallucination detection (is an answer grounded in context?) • Rule-following verification (does content follow your guidelines?) • Safety classification (prompt injections, jailbreaks, harmful requests) These are key building blocks for agents, RAG pipelines, and LLM guardrails, where every input/output must be screened and verified at minimal latency. 🔗 GitHub: github.com/Knowledgator/G… 🤗 Models: huggingface.co/knowledgator/g… 🌍 More: knowledgator.com

English

2.5K

Knowledgator retweetledi

Ihor Stepanov@ihor_step·12 Şub

🚀 Exciting news: GLiNER meets Transformers v5! We’ve just released a new version of GLiNER, packed with major updates and improvements: ▪️ Transformers v5 support — fully compatible with the latest Hugging Face Transformers updates. ▪️ Improved FlashDeBERTa integration — train GLiNER much faster and handle longer sequences more efficiently. ▪️ New architecture types — introducing an advanced token-level subarchitecture that enables recognition of spans of any length. This was a game changer for our entity linking models. The new token-level type works for both uni-encoder and bi-encoder setups. ▪️ Span-constrained prediction — you can now provide input spans, ensuring GLiNER predictions are always constrained to them. This enables chaining multiple GLiNER models for more weighted and controllable results. ▪️ Plus several smaller fixes and improvements. 📦 Upgrade now: pip install gliner -U 🔗 Resources: • Entity Linking models: huggingface.co/collections/kn… • Entity Linking framework (GLinker): github.com/Knowledgator/G… • GLiNER repository: github.com/urchade/GLiNER 🙏 Huge thanks to all contributors, especially @bioMikeee for fixing bugs, @tomaarsen for helping navigate the Transformers v5 update, and @urchadeDS for the review of changes.

English

487

𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8·4 Şub

A solid release from Knowledgator: GLiNER-bi-V2, a bi-encoder upgrade to GLiNER for zero-shot NER. Core change vs original GLiNER: the bi-encoder decouples text encoding from label encoding, so you can precompute/cache label embeddings and score against 1k+ entity types with near-constant inference cost (and constant text-encode memory regardless of label count). Stack: ModernBERT-based text encoder (Ettin) + sentence-transformer/BGE label encoder, with optional cross-attn fusion. The value is system-level efficiency: cheaper scaling, faster taxonomy iteration, and realistic million-entity linking via GLiNKER to Wikidata-scale KBs. Numbers: 61.5 Micro-F1 (CrossNER, zero-shot) and up to 130× throughput at 1024 labels with precomputed embeddings. The 194M base reaches ~98% of the large model’s accuracy at ~2.6× higher throughput; Large adds ~+1.2 F1 at ~2–3× lower throughput.

English

1.2K

Knowledgator@knowledgator·4 Şub

@gm8xx8 @gm8xx8 , thank you for sharing our work.

English

Knowledgator@knowledgator·3 Şub

Read our blog post to learn more about the framework: blog.knowledgator.com/glinker-modula…

English

Knowledgator@knowledgator·3 Şub

Entity linking models are available @huggingface: Linking models: huggingface.co/collections/kn… Bi-encoder models: huggingface.co/collections/kn…

English

Knowledgator@knowledgator·3 Şub

🚀 Meet GLinker — an ultra-fast, modular, zero-shot entity linking system. Back in 2024, we developed GLiNER bi-encoder that enabled zero-shot NER across hundreds of entity types. But that was just the start. Our real goal was linking text to millions of entities dynamically, without retraining. After ~2 years of research & engineering, it’s here. github.com/Knowledgator/G…

English

264

Knowledgator retweetledi

Ihor Stepanov@ihor_step·2 Şub

You can now bake a SoTA zero-shot NER model in 7 minutes 🍞 I still remember my first zero-shot NER model back in 2022. It required millions of examples and days of training to get something usable. Fast forward to today. Now the same model class can be trained in minutes, literally for ~$0.30, using a modern NVIDIA A6000 Pro rented on Google Cloud spot instances. This is possible thanks to multiple optimizations in GLiNER, including: 🔹 FlashDeBERTa 🔹 Sequence packing 🔹 Better data 🔹 Improved training procedures 🔗 Check how to train GLiNER here: urchade.github.io/GLiNER/trainin… What’s important is that this progress didn’t come from a single breakthrough. It required advances across model architecture and optimizations, data quality, GPUs, and infrastructure. AI progress is fundamentally multidimensional, and when these dimensions compound, the result can look exponential. And we’re still early 🔥

English

1.1K

Knowledgator@knowledgator·26 Oca

FlashDeberta is already available for our zero-shot classification framework, check the documentation in the readme: #flash-attention-backends" target="_blank" rel="nofollow noopener">github.com/Knowledgator/G…

English

128

Knowledgator@knowledgator·26 Oca

💡 What’s new: 🔹 2–5× efficiency vs torch DeBERTa 🔹 Lower memory footprint 🔹 Forward + backward support 🔹 3 kernels auto-selected by input characteristics

English

146

Knowledgator@knowledgator·26 Oca

🚀 FlashDeBERTa v2 is out — faster, leaner, now with backward support. DeBERTa keeps proving that great architecture ages well. With new optimized kernels, FlashDeBERTa delivers 2–5× speedups, lower memory use, and full training support. github.com/Knowledgator/F…

English

Knowledgator retweetledi

Ihor Stepanov@ihor_step·25 Kas

🚀 Big News for the GLiNER Community! We're rolling out one of the largest updates in GLiNER's history — an upgrade that makes the framework more mature, flexible, and reliable than ever. Here's what's new 🧵👇

English

241

Knowledgator retweetledi

Ihor Stepanov@ihor_step·10 Kas

🚀 @nvidia recently released GLiNER-PII, a new model for detecting personally identifiable information (PII), built on top of our bi-encoder #GLiNER architecture. huggingface.co/nvidia/gliner-…

English

353

Knowledgator@knowledgator·4 Kas

Open-source works best with community input ❤️ If you’ve used GLiNER, GLiClass, or Comprehend-it, we want to hear from you. 👍 What worked for you? 🛠️ What needs improvement? 💭 What should we build next? Feedback form: docs.google.com/forms/d/e/1FAI…

English

159

Keşfet

@MaxWBuckley @urchadeDS @bioMikeee @tomaarsen @gm8xx8 @huggingface @nvidia @elonmusk