Jordan

454 posts

Jordan banner
Jordan

Jordan

@ZedDou1

🥦

Katılım Mayıs 2013
170 Takip Edilen18 Takipçiler
Jordan
Jordan@ZedDou1·
@JinaAI_ @counting_words Therefore, shouldn't the whole approach work even without needing to actually inject the watermarks since we should have in some extent semantic preservation between original, watermarked and paraphrased text no matter if it is paraphrasing original or watermarked text?
English
0
0
1
15
Jina AI
Jina AI@JinaAI_·
@counting_words it’s pretty much like the original input. you can find some examples from the blog post.
English
1
0
1
98
Jina AI
Jina AI@JinaAI_·
Text watermarking using — surprise — embedding models?! And those watermarks persist after paraphrasing & translation—one of the most "out-of-domain" usages of embeddings we learned at EMNLP2024. It leverages the long-context and cross-lingual features of jina-embeddings-v3 to create a robust watermark system. But first, what is a good text watermark?
English
3
7
46
7K
Jordan
Jordan@ZedDou1·
@TejasReddyS1 @victorialslocum @AnthropicAI Basically, your first request that includes the whole document will take time but once it is done, the key and value vectors of your document can be cached and will not need to be re-computed again, you will only have to process the extra text in your prompt (e.g. the chunk here)
English
0
0
0
25
Victoria Slocum
Victoria Slocum@victorialslocum·
Contextual Retrieval is a new RAG method introduced by @AnthropicAI, where context is automatically inserted into each chunk by an LLM. A naive chunking method will create a vector embedding independently for each chunk, and then RAG systems use these embeddings to find matching chunks to the query. But there’s an issue - lost context - where chunks lose the context of the original document. Anthropic's method, Contextual Retrieval, is a brute-force strategy to combat the issue of lost context: 1. Each chunk is sent to the LLM alongside its full document 📃 2. An LLM adds relevant context to every chunk 🧩 3. This results in richer and more informative embeddings 🔎 @JinaAI_ developed a separate strategy, late chunking, which is a different way of handling lost context. In late chunking, the full document is first embedded and then chunked afterwards, so that full document context is included in the chunked embedding. Both methods solve this problem, but how do they stack up? 💻 Contextual retrieval requires evaluating an LLM for each chunk (expensive) 📊 Both methods can include the missing context in the chunk embeddings Whilst contextual retrieval could cost a lot, you could use a smaller LM for this task, or use Anthropic’s new prompt caching feature to bring these costs down. Take a look at the similarity scores below for a basic example. Both late chunking and contextual retrieval perform similarly and improve on naive chunking! This method is still young, so we still don’t know what will work best in a real-world RAG application 👀 See it in action in this notebook by @DanielW966: github.com/weaviate/recip…
Victoria Slocum tweet media
English
7
56
395
38K
Jordan
Jordan@ZedDou1·
@TejasReddyS1 @victorialslocum @AnthropicAI To me some local context would be enough for that (while providing less noise to generate the context) however you wouldn't be able to fully exploit the prompt caching. It feels to me that the presented approach serves as an ad for it to be honest
English
1
0
1
34
Jordan
Jordan@ZedDou1·
@TejasReddyS1 @victorialslocum @AnthropicAI From my understanding, that's a limitation of the presented approach. However, I doubt the usefulness of feeding something like a 100-page pdf just to prepend 50-100 tokens of context to the chunk.
English
1
0
0
37
Jordan
Jordan@ZedDou1·
@michael_g_u @arankomatsuzaki Thanks for the answer! I understand the explanation and it definitely makes sense to me but I'm still struggling with the notion of right/left side To me, we usually compare a query to a piece of text and there's no notion of positioning that's why I'm struggling to understand
English
0
0
0
4
Michael Günther
Michael Günther@michael_g_u·
@ZedDou1 @arankomatsuzaki The similarity is symmetric but not the loss. you always compare the similarity of a text from the left side to all the similarities of texts on the right side. Bidirectional means, that we also compare the right sides to all left sides. So you have negatives on both sides.
English
1
0
0
14
Aran Komatsuzaki
Aran Komatsuzaki@arankomatsuzaki·
jina-embeddings-v3: Multilingual Embeddings With Task LoRA - Achieves SotA perf on multilingual and long-context retrieval - Outperforms proprietary models on English - Up to 8k context length hf: huggingface.co/jinaai/jina-em… abs: arxiv.org/abs/2409.10173 Edit: Wait for the hf page to be updated soon
Aran Komatsuzaki tweet media
English
2
42
268
21.1K
Jordan
Jordan@ZedDou1·
@tonywu_71 @bclavie Just curious whether you tried the 3 ways of selecting the tokens to pool as mentioned in the blog? And if so did you also observe that hierarchical clustering for selecting tokens works better? In any case, well done! 😁
English
1
0
1
83
Tony Wu
Tony Wu@tonywu_71·
✨Remember @bclavie's awesome blog post on Token Pooling for ColBERT? Ofc we had to try it on ColPali, and the results are exciting: we managed to reduce the total number of vectors in document embeddings by ≈66.7% while retaining ≈97.8% of the original performance! 🧵(1/n)
Tony Wu tweet media
Ben Clavié@bclavie

🥁🥁 New blog post out (link in thread), w/ two aims: 🤓 Providing a clear, hopefully easy-to-read intro to ColBERT, without assuming you've ever used it. 🏊Introducing ColBERT Token Pooling ✨: You can reduce the size of ColBERT indexes by 66% with barely any performance hit!

English
4
17
96
26.6K
Jordan
Jordan@ZedDou1·
@diervo @sbergman @omarsar0 @simonw From my understanding, they simply use it as an encoder of the query/chunk and perform the regular cosine distance between the resulting embeddings to get the top k chunks
English
0
0
0
168
Diego Ferreiro Val
Diego Ferreiro Val@diervo·
Google now allows you to cache immutable parts of the prompt (image th set of articles only changes so often), I assume they still charge you though. What I don't understand from this paper is how this dragon retriever works, is it used for create the embeddings first and then retrieval?
English
2
0
1
59
elvis
elvis@omarsar0·
Very interesting study on comparing RAG and long-context LLMs. Main findings: - long-context LLMs outperform RAG on average performance - RAG is significantly less expensive On top of this, they also propose Self-Route, leveraging self-reflection to route queries to RAG or LC. Report that Self-Route significantly reduces computational cost while maintaining comparable performance to LC. Interesting result: "On average, LC surpasses RAG by 7.6% for Gemini-1.5-Pro, 13.1% for GPT-4O, and 3.6% for GPT-3.5-Turbo. Noticeably, the performance gap is more significant for the more recent models (GPT-4O and Gemini-1.5-Pro) compared to GPT-3.5-Turbo, highlighting the exceptional long-context understanding capacity of the latest LLMs." Again, not sure why Claude was left out of the analysis. I would love to see that including other custom LLMs trained to perform better at RAG. I am not entirely convinced that long-context LLMs generally can outdo RAG systems today. But I think it's interesting to see a combination of the approaches which is something I've been advocating for recently.
elvis tweet media
English
10
174
682
50.8K
Jordan
Jordan@ZedDou1·
@tonywu_71 Thank you for the clear answers and the amazing work! Really like the approach and especially the explainability part! Keep up the good work! 🚀
English
0
0
1
25
Tony Wu
Tony Wu@tonywu_71·
@ZedDou1 However, I personally really like the idea of running VQA directly from ColPali's document embeddings but it would require further model training. Our team is considering experimenting with this, so stay tuned!
English
1
0
0
63
Tony Wu
Tony Wu@tonywu_71·
🚀 How can we make sense of visual document at a large scale for retrieval? With our new paper “ColPali”, we propose to leverage VLMs to construct efficient multi-vector embeddings in the visual space for document retrieval: arxiv.org/abs/2407.01449 📝🔍 (1/10)
English
4
38
235
33K
Jordan
Jordan@ZedDou1·
@tonywu_71 From my understanding, even though OCR is not needed at indexing time, it would still be required at generation time so we can include it the prompt isn't? Or do you directly use the projected patches embeddings to represent the context the LLM should use to answer the query?
English
1
0
0
46
Tony Wu
Tony Wu@tonywu_71·
🧩 Current document retrieval systems are complex and requires multiple steps like OCR, HTR, layout detection, table parsers, text chunking, etc... This process is difficult to set up, prone to errors, and loses subtle visual cues such as font size and text color. (2/10)
English
2
0
18
3.1K
Jordan
Jordan@ZedDou1·
@cwolferesearch Just for the sake of sharing, there's been some research saying the opposite though (but I didn't went through all the reading/paper) x.com/learnprompting…
Learn Prompting@learnprompting

🚨 Role Prompting doesn't work... Our team at @learnprompting led a year-long study with co-authors from @OpenAI & @Microsoft, analyzing over 1,500 prompting papers. We narrowed it down to 58 different prompting techniques and we analyzed every one. Here's what we found...

English
3
0
1
96
Cameron R. Wolfe, Ph.D.
Cameron R. Wolfe, Ph.D.@cwolferesearch·
Quick and easy tip for better evals: Role prompting is a useful trick for generating text with LLMs, but combining role prompting with LLM-as-a-judge can be a problem... TL;DR: If we tell the LLM judge that it is an "expert" in a certain field within the system message, then the scores outputted by LLM-as-a-judge become much harsher / lower and evaluation quality (i.e., correlation with human evaluation) deteriorates as a result. What is role prompting? This concept refers to a prompt that asks a language model to assume a role; e.g., a math expert, a poet, a pirate, etc. Upon assuming the role, the LLM will tailor its responses accordingly. For example, an LLM told that it is a pirate might say things like "Ahoy", while an LLM told that it is a math expert may perform arithmetic more accurately. Why would we use this? Role prompting is a fun trick that can be useful for testing the instruction following capabilities of an LLM. However, it can also have a non-negligible impact on the quality of the LLM's generations. For example, role prompting an LLM as an expert in a certain field tends to improve the quality of its generations when solving domain-specific problems (e.g., math, reasoning, writing, and more). What is LLM-as-a-judge? LLM-as-a-judge is a simple but powerful evaluation technique that uses an LLM to score or evaluate the output of another LLM. The simplest way to use LLM-as-a-judge is to write a prompt asking an LLM (e.g., GPT-4) to score a model's response to some prompt on a scale from 1-10 (see image below). We can also pass pairs of responses and ask which one is better (i.e., pairwise scoring instead of pointwise scoring). Role prompting + LLM-as-a-Judge. Role prompting is very useful for generating text with LLMs for the reasons mentioned above. However, we should be very careful when using role prompting in tandem with LLM-as-a-judge. In particular, if we tell the LLM judge that it is an "expert" in a certain field (e.g., math or poetry), it will be much harsher when scoring outputs. In fact, many of the scores will be as low as possible (i.e., 1 out of 10)!
Cameron R. Wolfe, Ph.D. tweet media
English
5
24
99
8.4K
Jordan
Jordan@ZedDou1·
@_xjdr Does your automated eval rely on an LLM-as-a-judge approach with an LLM that demonstrated high level of agreement with you or is it more about evaluating the new LLM on a handcrafted set of "close-ended" tasks?
English
0
0
0
131
xjdr
xjdr@_xjdr·
one of the highest return on effort activities i have ever done is setting up automated evals and benchmarking for any new model i touch. I can no longer imagine living without it
English
4
2
86
7.8K
Jordan
Jordan@ZedDou1·
@Xianbao_QIAN Thanks for the answer! Does the absorption part means that somehow, the up-projection matrix is embedded into the query vector such that multiplying the query vector by the latent KV vector is similar to computing the regular dot product between query and key?
English
1
0
0
29
Tiezhen WANG
Tiezhen WANG@Xianbao_QIAN·
@ZedDou1 The project matrix can be absorbed into other matrix so the latent KV is not introducing more complexity during inference compared to normal KV Cache. The challenge is how to apply ROPE attention efficiently since it can't be absorbed. There is a modification on ROPE with latent.
Tiezhen WANG tweet media
English
1
0
1
65
Tiezhen WANG
Tiezhen WANG@Xianbao_QIAN·
Attention layer normally uses KV Cache to reduce repetitive compute, but it consumes significant GPU RAM, limiting concurrent requests. DeepSeek V2 introduces Multi-head Latent Attention (MLA), which stores only a small latent representation, resulting in substantial RAM savings.
Tiezhen WANG tweet media
English
2
0
19
1.7K
Jordan
Jordan@ZedDou1·
@BlackHC @arankomatsuzaki From what I've understood, you can either not use them (regular decoding process) or use them in the context of speculative decoding to speed up the inference, the additional heads then act as the draft model in the context of speculative decoding
English
0
0
1
38
Aran Komatsuzaki
Aran Komatsuzaki@arankomatsuzaki·
Meta presents Better & Faster Large Language Models via Multi-token Prediction - training language models to predict multiple future tokens at once results in higher sample efficiency - up to 3x faster at inference arxiv.org/abs/2404.19737
Aran Komatsuzaki tweet media
English
15
126
868
182.6K
Jordan
Jordan@ZedDou1·
@yuntiandeng I assume you considered all the non-RTed papers as negative. This may be problematic as we may have false negatives in the dataset, except if he's truly a machine that is able to go through all the published papers (is he?) 😁
English
0
0
0
25
Yuntian Deng
Yuntian Deng@yuntiandeng·
Will your paper catch the eye of @_akhaliq? I built a demo that predicts if AK will select a paper. It has 50% F1 using DeBERTa finetuned on data from past year. As a test, our upcoming WildChat arXiv has a 56% chance. Hopefully not a false positive🤞 🔗huggingface.co/spaces/yuntian…
Yuntian Deng tweet media
English
10
19
180
51.2K
Jordan
Jordan@ZedDou1·
@abacaj @major_katsurAGI @nisten @altryne Could it be done to avoid having a discrepancy in the meaning of the EOS? E.g. if pre-training with packing, the EOS only marks the separation between two sequences (and thus the part before the EOS is not necessarily complete) while it is when doing instruction fine-tuning
English
0
0
0
42
anton
anton@abacaj·
@major_katsurAGI @nisten @altryne a different format is usually used during fine tuning. though in mistral instruct they used the same EOS token as pretrain (</s>). guess it just depends on what the authors were thinking 🤷‍♂️
English
3
0
3
249
anton
anton@abacaj·
See some interesting takes here about phi-3. Some say it sucks (was trained on test set!?), others say it outperforms 70B models. I don’t know what it was trained on, but it’s pretty easy to take the weights and try the model yourself - can even try it without a GPU. I’m finding it pretty useful for extraction tasks
English
14
1
147
75.6K