ZiniuYu

41 posts

ZiniuYu banner
ZiniuYu

ZiniuYu

@ZiniuYu

bot

New York, USA Katılım Eylül 2015
80 Takip Edilen34 Takipçiler
ZiniuYu retweetledi
Jina AI
Jina AI@JinaAI_·
Introducing jina-reranker-m0: our new multilingual multimodal reranker model for ranking visual documents across multiple languages: it accepts a query alongside a collection of visually rich document images, including pages with text, figures, tables, infographics, and various layouts across multiple domains and over 29 languages.
GIF
English
3
35
227
20.2K
ZiniuYu retweetledi
Elastic
Elastic@elastic·
Elasticsearch Open Inference API now supports @JinaAI_! Developers can now build search and RAG applications using the latest Jina AI embedding and reranking models without additional integration or costs. Learn more: go.es.io/4i8sNOs
English
0
8
27
4K
ZiniuYu retweetledi
Jina AI
Jina AI@JinaAI_·
FREE. DEEP. SEARCH. No bullet points, no lengthy report, no ads, no login, no nonsense design. You ask, we answer. Pure & Deep. search.jina.ai
English
10
56
309
22.8K
ZiniuYu retweetledi
Jina AI
Jina AI@JinaAI_·
Dear Readers, you'll ❤️ this: Introducing ReaderLM-v2, a 1.5B small language model for HTML-to-Markdown conversion and HTML-to-JSON extraction with exceptional quality. Thanks to the new training paradigm and higher-quality training data, ReaderLM-v2 is a significant leap forward from its predecessor, particularly in handling long-context and markdown syntax. While the first generation approached HTML-to-markdown conversion as a "selective-copy" task, v2 treats it as a true translation process. This enables the model to masterfully leverage markdown syntax, excelling at generating complex elements like code fences, nested lists, tables and LaTex equations. You can use ReaderLM-v2 today via Reader API, HuggingFace, AWS Sagemaker, etc.
GIF
English
22
74
572
58.5K
ZiniuYu retweetledi
Jina AI
Jina AI@JinaAI_·
Last message of 2024! Some may have noticed the recent upgrade of our search foundation user experience. Today, we're proud to announce our SOC 2 Type I and Type II compliance. This milestone validates our enterprise-grade security standards and commitment to providing trustworthy service and protecting your data. Thanks to PrescientAssurance, for their audit process and to our incredible team for making this happen!
Jina AI tweet media
English
3
8
17
8.8K
ZiniuYu retweetledi
Jeremy Howard
Jeremy Howard@jeremyphoward·
I'll get straight to the point. We trained 2 new models. Like BERT, but modern. ModernBERT. Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff. It's much faster, more accurate, longer context, and more useful. 🧵
Jeremy Howard tweet media
English
127
655
4.7K
436.9K
ZiniuYu retweetledi
Nan Wang
Nan Wang@nanwang_t·
Last call! Don’t miss the chance to dive into the latest multimodal RAG approach. 🔍📖
Jina AI@JinaAI_

No parsing or OCR; No multi-vector or late interaction! VisRAG from @TsinghuaNLP outperforms TextRAG by addressing RAG bottlenecks at both stages: achieving higher retrieval accuracy and better answer generation via multimodal reasoning. We're thrilled to invite @dgdsxyushi to present his VisRAG work and share his hot take on how pure vision-based pipelines can better generalize to real-world scenarios.

English
0
1
3
271
ZiniuYu retweetledi
Jina AI
Jina AI@JinaAI_·
Classification feels "classic" IYKWIM; how about naming 3 classification problems that have newly emerged and are actually cool? We'll start: jina.ai/news/jina-clas… 1. Routing requests to LLMs for load-balancing and cost-efficiency; 2. detecting whether Jina Reader is receiving genuine content or blocked messages; 3. filtering statements from opinions in long documents and sending the former to a grounding API. Read more from the blog post below.
English
1
5
13
1.8K
ZiniuYu retweetledi
Jina AI
Jina AI@JinaAI_·
Jina-ColBERT-v2 is here. jina.ai/news/jina-colb… Superior retrieval performance vs the original ColBERT-v2 from @stanfordnlp (+6.5%) & our previous jina-colbert-v1-en(+5.4%). Multilingual support for 89 languages and programming languages. User-controlled output embedding sizes (128/96/64-dim) through Matryoshka representation learning, and finally 8192-token length!
English
4
32
199
31.9K
ZiniuYu retweetledi
Jina AI
Jina AI@JinaAI_·
🚀jina.ai/news/jina-rera… Jina Reranker v2 is here! The best-in-class reranker for Agentic RAG. Featuring cross-lingual retrieval over 100+ langs, function-calling, table & code search and ultra-fast! Available via API, @huggingface; AWS & Azure coming soon! Let's jump in:
English
1
29
153
26.4K
ZiniuYu retweetledi
Jina AI
Jina AI@JinaAI_·
huggingface.co/spaces/AIR-Ben… Together with the research team @BAAIBeijing (the creator of bge-m3 embeddings), we are excited to release a new benchmark, AIR-Bench, on @huggingface, focusing on a fair out-of-domain evaluation for RAG & NeuralIR. AIR-Bench stands for Automated Heterogeneous Information Retrieval Benchmark. It differs from the classic MTEB in three key aspects: 1. It generates synthetic data for benchmarking, ensuring that tests never overlap with training, thus preventing overfitting and data leakage issues common in MTEB and BEIR. 2. It is designed for modern Neural IR, LLM, and RAG pipelines, evaluating not only dense retrievers but also pipelines w/ or w/o rerankers, and it also evaluates the search over very long documents. 3. It can be easily extended to new domains and languages, ensuring the benchmark remains relevant as users' interests continue to expand. Interested? Read more 👇
English
5
28
112
34.3K
ZiniuYu retweetledi
Jina AI
Jina AI@JinaAI_·
Two weeks ago, we started alpha-testing the new auto fine-tuning feature with some of our customers, and it's time to share some bitter and sour lessons we learned. For those who don't know, auto fine-tuning aims to be the "Devin for AI engineers." It receives an instruction from the user specifying how they want an embedding model to perform (e.g., legal/healthcare). Then, it calls LLM agents to generate synthetic data, fine-tune the model, evaluate and test it, and finally push the model and dataset to @huggingface . Users can then directly use them via SentenceTransformers. Essentially, the only input to the system is an instruction like "I want my embeddings to be good on insurance docs," and the system (LLM agents) takes care of everything a regular AI engineer would do in the inner-loop (i.e., everything before serving, as serving/DevOps would be the outer-loop).
GIF
English
4
20
85
14.6K
ZiniuYu retweetledi
Jina AI
Jina AI@JinaAI_·
We are releasing two new open-source reranker models: jina-reranker-v1-turbo-en and jina-reranker-v1-tiny-en, the latter has only 30M parameters and four layers! 🤯 These two new rerankers enjoy 5X faster inference speed than our last jina-reranker-v1-base model at only a very small cost on the quality. You can find these two models on @huggingface Turbo (6-layer, 37.8M parameters): huggingface.co/jinaai/jina-re… Tiny (4-layer, 33M parameters): huggingface.co/jinaai/jina-re…
Jina AI tweet media
English
3
28
153
35.3K
ZiniuYu retweetledi
Jina AI
Jina AI@JinaAI_·
How did we train DE-EN, ES-EN & ZH-EN bilingual embeddings with 8192-token length? Is a bilingual model superior to a multilingual one? How do they perform on out-of-domain data? Find out in our latest publication. 👇 arxiv.org/abs/2402.17016
Jina AI tweet media
English
0
10
44
16.1K
ZiniuYu retweetledi
Jina AI
Jina AI@JinaAI_·
😇 New week, new embeddings for you! Announce 𝗷𝗶𝗻𝗮-𝗲𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴𝘀-𝘃𝟮-𝗯𝗮𝘀𝗲-𝗰𝗼𝗱𝗲, optimized for code search. This powerful model supports searches between English and 30 programming languages, all with an impressive 8K token length and SOTA performance!🚀
Jina AI tweet media
English
2
22
94
19.6K
ZiniuYu retweetledi
Jina AI
Jina AI@JinaAI_·
Start with a boost! 🚀 Get a massive head start with our embedding API - now offering 1,000,000 free tokens for each new API key, no credit card required. Ideal for personal & commercial projects. 8K length, high scalability, easy integration. Dive in now! jina.ai/embeddings?noc…
English
0
14
23
2.4K