Knowledgator

594 posts

Knowledgator banner
Knowledgator

Knowledgator

@knowledgator

Open-source ML research company focused on information extraction #ExplainableAI #AI #opensource #InformationExtraction #UnstructuredData #NLP

Katılım Ekim 2023
96 Takip Edilen396 Takipçiler
Knowledgator retweetledi
Ihor Stepanov
Ihor Stepanov@ihor_step·
GLiNER just hit a new record 🚀 We just released a new version of the package, and GLiNER reached almost 30k downloads per day. I couldn't be prouder of this milestone and the community behind it. This release is packed with meaningful improvements: Performance wins ▪️ Vectorized CPU-path preprocessing and decoding hot loops ▪️ Vectorized relation decoding, replacing costly .item() calls with torch.where ▪️ Batch-level span decoding — reducing CUDA kernel launches from B*8 to ~8 Architecture & quality: ▪️ Restructured repo and updated documentation ▪️ Improved GLiNER-relex architecture and decoding A huge thank you to @MaxWBuckley for inference optimizations. Vivek for building an evaluation pipeline for new relation extraction models, Bryan Bradfo for improving documentation, and @urchadeDS for his review. Every single contribution matters. These improvements unlocked something bigger; we built RetriCo, a super-efficient GraphRAG system powered by the new GLiNER-relex models. For us at @knowledgator , open source is how we build technologies. It's how we collaborate with developers worldwide. It's what makes our work meaningful. We believe open information extraction technologies are essential for decentralizing power in AI. The tools for understanding and structuring the world's knowledge shouldn't be locked behind closed doors. Come build with us: 🔗 GLiNER: github.com/urchade/GLiNER 🔗 RetriCo: github.com/Knowledgator/R… 🔗 GLiNER Community: discord.gg/ZGsXs9GY
Ihor Stepanov tweet media
English
0
1
14
600
Knowledgator retweetledi
𝚐𝔪𝟾𝚡𝚡𝟾
𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8·
Building on GLiClass-V3, GLiClass-Instruct extends the encoder-first backbone with instruction conditioning for real zero-shot workflows. The architecture remains the same, but adds hierarchical labels, few-shot examples, label descriptions, and task prompts to control behavior without fine-tuning. Latency stays stable as taxonomies scale, and V3 maintains ~0.72 average zero-shot F1 (large) while preserving the throughput advantages of the underlying design, positioning it as infrastructure for large-taxonomy classification and guardrail workloads.
𝚐𝔪𝟾𝚡𝚡𝟾 tweet media𝚐𝔪𝟾𝚡𝚡𝟾 tweet media𝚐𝔪𝟾𝚡𝚡𝟾 tweet media
𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8

GLiClass-V3: A family of encoder-only models that match or exceed DeBERTa-v3-Large in zero-shot accuracy, while delivering up to 50× faster inference. Core Design: - Single-pass inference: No cross-encoder pairing needed. One forward pass handles all labels. - LoRA adapters: Fine-tuned on logic tasks (e.g., Formal Logic Reasoning, Commonsense QA) for symbolic generalization without catastrophic forgetting. - Edge-ready: gliclass-edge-v3.0 hits 97 ex/s on A6000, ideal for mobile and IoT GLiClass-V3 variants (gliclass-*): (Built on DeBERTa, ModernBERT, and Ettin for edge deployment) - large-v3.0: 70.0% avg F1 (best) - base-v3.0: 65.6% - modern-large: 60.8% - edge-v3.0: 48.7% (fastest, Ettin-based) - x-base: 57% F1 (EN), 42% (multilingual) for robust multilingual zero-shot generalization Benchmarks (zero-shot, no fine-tuning): - CR, SST2, IMDb: ~0.93–0.94 F1 - Outperforms GLiClass-v2 and cross-encoders (e.g., DeBERTa-v3-Large, RoBERTa) - Scales to 128+ labels with massive speedup (DeBERTa-Large: 0.25 ex/s vs GLiClass: 82.6) Use cases: - Multi-label classification (e.g., topic, sentiment, spam) - RAG reranking - Privacy-safe on-device NLP Built on DeBERTa and ModernBERT. Fully open-source. pip install gliclass

English
1
6
22
1.5K
Knowledgator
Knowledgator@knowledgator·
🧠 One lightweight model to classify, verify, and guard — introducing GLiClass-Instruct GLiClass started as a fast zero-shot topic classifier rivaling cross-encoders at a fraction of the cost. But topic classification wasn’t enough. Today we’re launching GLiClass-Instruct: a multitask model that can handle various tasks via sequence classification. ✨ What’s new • Hierarchical labeling for complex taxonomies • Few-shot in-context learning (no fine-tuning) • Natural language prompting for task control • EWC to add skills without forgetting old ones • 3× faster inference with FlashDeBERTa 🚀 New multi-task capabilities Beyond topic + sentiment classification, GLiClass-Instruct now supports: • Hallucination detection (is an answer grounded in context?) • Rule-following verification (does content follow your guidelines?) • Safety classification (prompt injections, jailbreaks, harmful requests) These are key building blocks for agents, RAG pipelines, and LLM guardrails, where every input/output must be screened and verified at minimal latency. 🔗 GitHub: github.com/Knowledgator/G… 🤗 Models: huggingface.co/knowledgator/g… 🌍 More: knowledgator.com
English
0
5
15
2.5K
Knowledgator retweetledi
Ihor Stepanov
Ihor Stepanov@ihor_step·
🚀 Exciting news: GLiNER meets Transformers v5! We’ve just released a new version of GLiNER, packed with major updates and improvements: ▪️ Transformers v5 support — fully compatible with the latest Hugging Face Transformers updates. ▪️ Improved FlashDeBERTa integration — train GLiNER much faster and handle longer sequences more efficiently. ▪️ New architecture types — introducing an advanced token-level subarchitecture that enables recognition of spans of any length. This was a game changer for our entity linking models. The new token-level type works for both uni-encoder and bi-encoder setups. ▪️ Span-constrained prediction — you can now provide input spans, ensuring GLiNER predictions are always constrained to them. This enables chaining multiple GLiNER models for more weighted and controllable results. ▪️ Plus several smaller fixes and improvements. 📦 Upgrade now: pip install gliner -U 🔗 Resources: • Entity Linking models: huggingface.co/collections/kn… • Entity Linking framework (GLinker): github.com/Knowledgator/G… • GLiNER repository: github.com/urchade/GLiNER 🙏 Huge thanks to all contributors, especially @bioMikeee for fixing bugs, @tomaarsen for helping navigate the Transformers v5 update, and @urchadeDS for the review of changes.
English
0
2
13
487
𝚐𝔪𝟾𝚡𝚡𝟾
A solid release from Knowledgator: GLiNER-bi-V2, a bi-encoder upgrade to GLiNER for zero-shot NER. Core change vs original GLiNER: the bi-encoder decouples text encoding from label encoding, so you can precompute/cache label embeddings and score against 1k+ entity types with near-constant inference cost (and constant text-encode memory regardless of label count). Stack: ModernBERT-based text encoder (Ettin) + sentence-transformer/BGE label encoder, with optional cross-attn fusion. The value is system-level efficiency: cheaper scaling, faster taxonomy iteration, and realistic million-entity linking via GLiNKER to Wikidata-scale KBs. Numbers: 61.5 Micro-F1 (CrossNER, zero-shot) and up to 130× throughput at 1024 labels with precomputed embeddings. The 194M base reaches ~98% of the large model’s accuracy at ~2.6× higher throughput; Large adds ~+1.2 F1 at ~2–3× lower throughput.
𝚐𝔪𝟾𝚡𝚡𝟾 tweet media𝚐𝔪𝟾𝚡𝚡𝟾 tweet media𝚐𝔪𝟾𝚡𝚡𝟾 tweet media
English
2
5
19
1.2K
Knowledgator
Knowledgator@knowledgator·
🚀 Meet GLinker — an ultra-fast, modular, zero-shot entity linking system. Back in 2024, we developed GLiNER bi-encoder that enabled zero-shot NER across hundreds of entity types. But that was just the start. Our real goal was linking text to millions of entities dynamically, without retraining. After ~2 years of research & engineering, it’s here. github.com/Knowledgator/G…
English
1
2
8
264
Knowledgator retweetledi
Ihor Stepanov
Ihor Stepanov@ihor_step·
You can now bake a SoTA zero-shot NER model in 7 minutes 🍞 I still remember my first zero-shot NER model back in 2022. It required millions of examples and days of training to get something usable. Fast forward to today. Now the same model class can be trained in minutes, literally for ~$0.30, using a modern NVIDIA A6000 Pro rented on Google Cloud spot instances. This is possible thanks to multiple optimizations in GLiNER, including: 🔹 FlashDeBERTa 🔹 Sequence packing 🔹 Better data 🔹 Improved training procedures 🔗 Check how to train GLiNER here: urchade.github.io/GLiNER/trainin… What’s important is that this progress didn’t come from a single breakthrough. It required advances across model architecture and optimizations, data quality, GPUs, and infrastructure. AI progress is fundamentally multidimensional, and when these dimensions compound, the result can look exponential. And we’re still early 🔥
Ihor Stepanov tweet media
English
1
1
22
1.1K
Knowledgator
Knowledgator@knowledgator·
FlashDeberta is already available for our zero-shot classification framework, check the documentation in the readme: #flash-attention-backends" target="_blank" rel="nofollow noopener">github.com/Knowledgator/G…
English
0
0
2
128
Knowledgator
Knowledgator@knowledgator·
💡 What’s new: 🔹 2–5× efficiency vs torch DeBERTa 🔹 Lower memory footprint 🔹 Forward + backward support 🔹 3 kernels auto-selected by input characteristics
Knowledgator tweet media
English
1
0
1
146
Knowledgator
Knowledgator@knowledgator·
🚀 FlashDeBERTa v2 is out — faster, leaner, now with backward support. DeBERTa keeps proving that great architecture ages well. With new optimized kernels, FlashDeBERTa delivers 2–5× speedups, lower memory use, and full training support. github.com/Knowledgator/F…
English
1
3
24
1K
Knowledgator retweetledi
Ihor Stepanov
Ihor Stepanov@ihor_step·
🚀 Big News for the GLiNER Community! We're rolling out one of the largest updates in GLiNER's history — an upgrade that makes the framework more mature, flexible, and reliable than ever. Here's what's new 🧵👇
English
1
1
7
241
Knowledgator
Knowledgator@knowledgator·
Open-source works best with community input ❤️ If you’ve used GLiNER, GLiClass, or Comprehend-it, we want to hear from you. 👍 What worked for you? 🛠️ What needs improvement? 💭 What should we build next? Feedback form: docs.google.com/forms/d/e/1FAI…
English
0
1
4
159