ZiniuYu

41 posts

ZiniuYu

@ZiniuYu

bot

New York, USA Katılım Eylül 2015

80 Takip Edilen34 Takipçiler

ZiniuYu retweetledi

Jina AI@JinaAI_·8 Nis

Introducing jina-reranker-m0: our new multilingual multimodal reranker model for ranking visual documents across multiple languages: it accepts a query alongside a collection of visually rich document images, including pages with text, figures, tables, infographics, and various layouts across multiple domains and over 29 languages.

GIF

English

227

20.2K

ZiniuYu retweetledi

Elastic@elastic·20 Şub

Elasticsearch Open Inference API now supports @JinaAI_! Developers can now build search and RAG applications using the latest Jina AI embedding and reranking models without additional integration or costs. Learn more: go.es.io/4i8sNOs

English

ZiniuYu retweetledi

Jina AI@JinaAI_·18 Şub

FREE. DEEP. SEARCH. No bullet points, no lengthy report, no ads, no login, no nonsense design. You ask, we answer. Pure & Deep. search.jina.ai

English

309

22.8K

ZiniuYu retweetledi

Jina AI@JinaAI_·15 Oca

Dear Readers, you'll ❤️ this: Introducing ReaderLM-v2, a 1.5B small language model for HTML-to-Markdown conversion and HTML-to-JSON extraction with exceptional quality. Thanks to the new training paradigm and higher-quality training data, ReaderLM-v2 is a significant leap forward from its predecessor, particularly in handling long-context and markdown syntax. While the first generation approached HTML-to-markdown conversion as a "selective-copy" task, v2 treats it as a true translation process. This enables the model to masterfully leverage markdown syntax, excelling at generating complex elements like code fences, nested lists, tables and LaTex equations. You can use ReaderLM-v2 today via Reader API, HuggingFace, AWS Sagemaker, etc.

GIF

English

572

58.5K

ZiniuYu retweetledi

Jina AI@JinaAI_·30 Ara

Last message of 2024! Some may have noticed the recent upgrade of our search foundation user experience. Today, we're proud to announce our SOC 2 Type I and Type II compliance. This milestone validates our enterprise-grade security standards and commitment to providing trustworthy service and protecting your data. Thanks to PrescientAssurance, for their audit process and to our incredible team for making this happen!

English

8.8K

ZiniuYu retweetledi

Jeremy Howard@jeremyphoward·19 Ara

I'll get straight to the point. We trained 2 new models. Like BERT, but modern. ModernBERT. Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff. It's much faster, more accurate, longer context, and more useful. 🧵

English

127

655

4.7K

436.9K

ZiniuYu retweetledi

Nan Wang@nanwang_t·28 Kas

That's a very smart use of embeddings!

Jina AI@JinaAI_

Text watermarking using — surprise — embedding models?! And those watermarks persist after paraphrasing & translation—one of the most "out-of-domain" usages of embeddings we learned at EMNLP2024. It leverages the long-context and cross-lingual features of jina-embeddings-v3 to create a robust watermark system. But first, what is a good text watermark?

English

218

ZiniuYu retweetledi

Nan Wang@nanwang_t·28 Kas

Last call! Don’t miss the chance to dive into the latest multimodal RAG approach. 🔍📖

Jina AI@JinaAI_

No parsing or OCR; No multi-vector or late interaction! VisRAG from @TsinghuaNLP outperforms TextRAG by addressing RAG bottlenecks at both stages: achieving higher retrieval accuracy and better answer generation via multimodal reasoning. We're thrilled to invite @dgdsxyushi to present his VisRAG work and share his hot take on how pure vision-based pipelines can better generalize to real-world scenarios.

English

271

ZiniuYu retweetledi

Jina AI@JinaAI_·23 Eki

Classification feels "classic" IYKWIM; how about naming 3 classification problems that have newly emerged and are actually cool? We'll start: jina.ai/news/jina-clas… 1. Routing requests to LLMs for load-balancing and cost-efficiency; 2. detecting whether Jina Reader is receiving genuine content or blocked messages; 3. filtering statements from opinions in long documents and sending the former to a grounding API. Read more from the blog post below.

English

1.8K

ZiniuYu@ZiniuYu·18 Eyl

Jina AI NB

Jina AI@JinaAI_

Finally, jina-embeddings-v3 is here! A frontier multilingual embedding model with 570M parameters, 8192-token length, achieving SOTA performance on multilingual and long-context retrieval tasks. It outperforms the latest proprietary models from OpenAI and Cohere, and outperforms multilingual-e5-large-instruct across all multilingual tasks. In fact, as of today, jina-embeddings-v3 is the best multilingual model and ranks 2nd on the MTEB English leaderboard for models < 1B parameters.

Indonesia

ZiniuYu retweetledi

Jina AI@JinaAI_·30 Ağu

Jina-ColBERT-v2 is here. jina.ai/news/jina-colb… Superior retrieval performance vs the original ColBERT-v2 from @stanfordnlp (+6.5%) & our previous jina-colbert-v1-en(+5.4%). Multilingual support for 89 languages and programming languages. User-controlled output embedding sizes (128/96/64-dim) through Matryoshka representation learning, and finally 8192-token length!

English

199

31.9K

ZiniuYu retweetledi

Jina AI@JinaAI_·25 Haz

🚀jina.ai/news/jina-rera… Jina Reranker v2 is here! The best-in-class reranker for Agentic RAG. Featuring cross-lingual retrieval over 100+ langs, function-calling, table & code search and ultra-fast! Available via API, @huggingface; AWS & Azure coming soon! Let's jump in:

English

153

26.4K

ZiniuYu retweetledi

Jina AI@JinaAI_·21 May

huggingface.co/spaces/AIR-Ben… Together with the research team @BAAIBeijing (the creator of bge-m3 embeddings), we are excited to release a new benchmark, AIR-Bench, on @huggingface, focusing on a fair out-of-domain evaluation for RAG & NeuralIR. AIR-Bench stands for Automated Heterogeneous Information Retrieval Benchmark. It differs from the classic MTEB in three key aspects: 1. It generates synthetic data for benchmarking, ensuring that tests never overlap with training, thus preventing overfitting and data leakage issues common in MTEB and BEIR. 2. It is designed for modern Neural IR, LLM, and RAG pipelines, evaluating not only dense retrievers but also pipelines w/ or w/o rerankers, and it also evaluates the search over very long documents. 3. It can be easily extended to new domains and languages, ensuring the benchmark remains relevant as users' interests continue to expand. Interested? Read more 👇

English

112

34.3K

ZiniuYu retweetledi

Jina AI@JinaAI_·16 May

Two weeks ago, we started alpha-testing the new auto fine-tuning feature with some of our customers, and it's time to share some bitter and sour lessons we learned. For those who don't know, auto fine-tuning aims to be the "Devin for AI engineers." It receives an instruction from the user specifying how they want an embedding model to perform (e.g., legal/healthcare). Then, it calls LLM agents to generate synthetic data, fine-tune the model, evaluate and test it, and finally push the model and dataset to @huggingface . Users can then directly use them via SentenceTransformers. Essentially, the only input to the system is an instruction like "I want my embeddings to be good on insurance docs," and the system (LLM agents) takes care of everything a regular AI engineer would do in the inner-loop (i.e., everything before serving, as serving/DevOps would be the outer-loop).

GIF

English

14.6K

ZiniuYu retweetledi

Jina AI@JinaAI_·18 Nis

We are releasing two new open-source reranker models: jina-reranker-v1-turbo-en and jina-reranker-v1-tiny-en, the latter has only 30M parameters and four layers! 🤯 These two new rerankers enjoy 5X faster inference speed than our last jina-reranker-v1-base model at only a very small cost on the quality. You can find these two models on @huggingface Turbo (6-layer, 37.8M parameters): huggingface.co/jinaai/jina-re… Tiny (4-layer, 33M parameters): huggingface.co/jinaai/jina-re…

English

153

35.3K

ZiniuYu@ZiniuYu·1 Mar

Reranker is so cool

Jina AI@JinaAI_

Boost your search and RAG accuracy with Jina Reranker. Our new model improves the accuracy and relevance by 20% over simple vector search. Try it now for free! jina.ai/news/maximizin…

English

174

ZiniuYu retweetledi

Jina AI@JinaAI_·28 Şub

How did we train DE-EN, ES-EN & ZH-EN bilingual embeddings with 8192-token length? Is a bilingual model superior to a multilingual one? How do they perform on out-of-domain data? Find out in our latest publication. 👇 arxiv.org/abs/2402.17016

English

16.1K

ZiniuYu retweetledi

Zac Li@0xweeknd·8 Nis

#动物森友会 #AnimalCrossing #ACNH #NintendoSwitch

QME

ZiniuYu retweetledi

Jina AI@JinaAI_·5 Şub

😇 New week, new embeddings for you! Announce 𝗷𝗶𝗻𝗮-𝗲𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴𝘀-𝘃𝟮-𝗯𝗮𝘀𝗲-𝗰𝗼𝗱𝗲, optimized for code search. This powerful model supports searches between English and 30 programming languages, all with an impressive 8K token length and SOTA performance!🚀

English

19.6K

ZiniuYu retweetledi

Jina AI@JinaAI_·31 Oca

Start with a boost! 🚀 Get a massive head start with our embedding API - now offering 1,000,000 free tokens for each new API key, no credit card required. Ideal for personal & commercial projects. 8K length, high scalability, easy integration. Dive in now! jina.ai/embeddings?noc…

English

2.4K

Keşfet

@JinaAI_ @stanfordnlp @huggingface @BAAIBeijing @elonmusk @BarackObama @taylorswift13 @cristiano