John Trengrove

234 posts

John Trengrove

@trengrj

Applied Research @weaviate_io

Australia Katılım Haziran 2013

857 Takip Edilen339 Takipçiler

John Trengrove@trengrj·17 Nis

@CJHandmer This seems needlessly alarmist when Australia spends roughly half the US on healthcare per capita and robotics in aged care will be transformative.

English

15.4K

Casey Handmer@CJHandmer·16 Nis

When I was in Australia last December, I found it impossible to transact with any business where there weren't latent government price controls or subsidies of some kind. I came to the realization that with the public sector growing about 5x faster than the private sector, Australia was well on the way to an effectively government run economy, communism by stealth. I dug deeper - the point of no return occurred in about 2013. caseyhandmer.wordpress.com/2026/04/16/aus…

English

124

367

2.3K

320.1K

John Trengrove@trengrj·14 Nis

@lateinteraction youtube.com/watch?v=Z2Tmdc…

YouTube

QME

John Trengrove@trengrj·14 Nis

@lateinteraction As an example look at this table on storage efficiency from Nemotron ColEmbed V2. The single vector model uses nearly 1000x less storage. Compression of residuals etc means the factor is less in practice but there is so much efficiency left on the table.

English

John Trengrove@trengrj·14 Nis

Great LIR keynote by @lateinteraction and 100% this on the need for less vectors per token in these models.

English

115

John Trengrove@trengrj·31 Mar

Wow apparently Google benchmarked a single-core CPU for RaBitQ vs an A100 GPU for TurboQuant

Jianyang Gao@gaoj0017

We need to publicly clarify serious issues in Google’s ICLR 2026 paper TurboQuant. TurboQuant misrepresents RaBitQ in three ways: 1. Avoids acknowledging key methodological similarity (JL transform) 2. Calls our theory “suboptimal” with no evidence 3. Reports results under unfair experimental settings We have expressed our concerns to the authors before their submission, but they chose not to fix them in their paper submission. The paper was accepted at ICLR 2026 and heavily promoted by Google (tens of millions of views). At that scale, uncorrected claims quickly become “consensus.” Facts: 1. RaBitQ already proves asymptotic optimality (FOCS’17 bound) 2. TurboQuant uses the same random rotation step but misses stating the connection 3. Their experiments used single-core CPU for RaBitQ vs A100 GPU for TurboQuant None of these is properly disclosed. We’ve filed a formal complaint and posted on OpenReview (openreview.net/forum?id=tO3AS…). We’ll release a detailed technical report on arXiv. Our goal is simple: keep the academic record accurate. Would appreciate people taking a look and sharing.

English

226

John Trengrove@trengrj·11 Mar

@andersonbcdefg Haha they should sell a recursive stack of crawling and anti-crawling services with continually increasing prices.

English

8.3K

Ben (no treats)@andersonbcdefg·11 Mar

lol. do i even have to say it

Cloudflare Developers@CloudflareDev

Introducing the new /crawl endpoint - one API call and an entire site crawled. No scripts. No browser management. Just the content in HTML, Markdown, or JSON.

English

3.6K

895.7K

John Trengrove@trengrj·6 Şub

Hey @bclavie, we primarily were showing memory reduction (80%+) and QPS improvements with the set of MUVERA parameters in the blog post. I agree sometimes you would want to increase projections for less memory reduction and higher recall. MUVERA demonstrated in the paper it out-performed PLAID in latency controlling for a target recall level. Some datasets achieved 99.8%+ recall some achieved (i.e. SCIDOCs with 57%) much lower. Especially with the more complex MaxSim distance calculation you don't always achieve great ANN index recall with multi-vector models. Personally I think MUVERA will win over PLAID style approaches for ecosystem reasons. With MUVERA you can adapt any ANN index (HNSW, DiskANN, ScaNN, Faiss, etc) with just the FDE encoding and MaxSim implementation (leveraging everything already done in this area) while PLAID requires a complex 4-stage pruning pipeline.

English

144

Ben Clavié@bclavie·5 Şub

I'm personally very bearish on MUVERA due to its many, many, failure cases, but I have a lot of respect for the @weaviate_io folks, so I gave this another deep read to see if there were things that could change my mind. However this has me a bit puzzled, if I'm reading the graph below right, it means that MUVERA itself produces a ~50+% incompressible performance degradation at commonly used indexing parameters, and still a ~20% degradation at near-bruteforce search tier parameters (ef=1024), meaning that the degradation would be purely due to MUVERA itself. For most retrieval uses, this would make the method completely unusable, as this degradation for many workflows is almost similar to the one we'd experience from replacing semantic search with pure bm25/keyword search. I feel like I'm missing something here so I'm very happy to be corrected if I'm misinterpreting the results!

Femke Plantinga@femke_plantinga

Multi-vector embeddings (ColBERT, ColPali) are budget killers. But MUVERA can cut your memory footprint by 70%. Multi-vector models offer incredible retrieval but suffer from massive memory overhead and slow indexing. MUVERA (Multi-Vector Retrieval via Fixed Dimensional Encodings) compresses these into single, fixed-dimensional vectors. How it works: MUVERA condenses a sequence of vectors (e.g., 100x96d) into one vector via: 1️⃣ Space Partitioning: Groups vectors into buckets using SimHash or k-means clustering. 2️⃣ Dimensionality Reduction: Applies random linear projection to compress each sub-vector while preserving dot products. 3️⃣ Repetitions: Repeats the process multiple times and concatenates results to improve accuracy. 4️⃣ Final Projection: Optional final compression (not used in Weaviate's implementation). The impact (LoTTE benchmark): - Memory: 12GB → <1GB. - Indexing: 20+ mins → 3-6 mins. - HNSW Graph: 99% smaller. There’s a trade-off: You trade a slight dip in raw recall for massive efficiency gains. However, by tuning the HNSW `ef` parameter (e.g., `ef=512`), you can recover 80-90%+ recall while keeping costs low. When should you use MUVERA? → Large-scale production RAG → Systems where memory/infrastructure costs are the direct bottleneck → Use cases requiring fast indexing MUVERA in @weaviate_io 1.31+ takes just a couple of lines of code. You can tune three parameters (k_sim, d_proj, r_reps) to balance memory usage and retrieval accuracy for your specific use case. Read the full technical deep-dive here: weaviate.io/blog/muvera?ut…

English

4.5K

John Trengrove@trengrj·8 Oca

I really like LEANN but it does require embedding chunks on the fly for each query (for each batch of distance calculations as you navigate the graph). For most use-cases the costs (in API calls, time, or GPUs) of creating embeddings is high enough you don't want to do this. However when paired with static embeddings (like with @minishlab's model2vec) LEANN could be a good solution for storage constrained workloads.

Lior Alexander@LiorOnAI

Stop storing embeddings. A laptop can now index 60 million text chunks using 6GB, not 200GB. LEANN, a new open-source project flips how vector search works. 𝗧𝗵𝗶𝘀 𝗶𝗻𝗱𝗲𝘅 𝗱𝗼𝗲𝘀 𝗻𝗼𝘁 𝘀𝘁𝗼𝗿𝗲 𝗲𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴𝘀 Instead of saving every vector, it stores a compact graph. Embeddings get recomputed only when a query actually needs them. • Graph-based selective recomputation • High-degree node pruning to keep recall stable • No accuracy drop versus FAISS-style indexes 𝗧𝗵𝗲 𝘀𝘁𝗼𝗿𝗮𝗴𝗲 𝗴𝗮𝗶𝗻𝘀 𝗮𝗿𝗲 𝗺𝗮𝘀𝘀𝗶𝘃𝗲 Email archives shrink from gigabytes to megabytes. Browser history fits in single-digit MBs. A 60M-document corpus fits on a laptop SSD. 𝗜𝘁 𝗿𝘂𝗻𝘀 𝗲𝗻𝘁𝗶𝗿𝗲𝗹𝘆 𝗹𝗼𝗰𝗮𝗹 No cloud calls. No telemetry. Everything stays on-device with zero ongoing cost. 𝗜𝘁 𝘂𝗻𝗹𝗼𝗰𝗸𝘀 𝗹𝗼𝗰𝗮𝗹 𝗥𝗔𝗚 𝗮𝘁 𝗻𝗲𝘄 𝘀𝗰𝗮𝗹𝗲 You can semantically search files, emails, chats, codebases, and live MCP sources. All from one local index, without changing your workflow.

English

428

John Trengrove@trengrj·6 Oca

@copyconstruct I think they are still being built but people aren't hearing about them. The build vs buy pendulum has swung heavily to build with AI so you now have 50 vibe coded file systems etc with few users vs a category winner.

English

144

Cindy Sridharan@copyconstruct·5 Oca

I’m genuinely curious about this. It’s easier than ever to whip up prototypes. Why *hasn’t* there been a Cambrian explosion of new file systems, storage engines, security protocols, orchestrators etc? Are these now “boring” problems no one wants to solve anymore?

English

15.4K

Cindy Sridharan@copyconstruct·5 Oca

The 2013-2015 years gave us groundbreaking tools like Docker, Kubernetes, Prometheus etc. We’re in the age of AI-turbocharged productivity, and in the last 3 years, we haven’t seen many similar tools developed or open sourced. You’d think solving “hard” problems became easier?

English

9.9K

John Trengrove@trengrj·28 Kas

@JacobianNeuro Really interesting! Do you think the scaling improvements would hold keeping a fixed output dimension for all models?

English

Jacob Portes@JacobianNeuro·27 Kas

We benchmarked retrieval performance across LLM model sizes from 125M - 7B parameters pretrained on 1B - 2T tokens. By varying the token parameter ratio (TPR) we show that: - Retrieval scales with pretraining FLOPs (surprise!) - Retrieval and ICL scores are strongly correlated

English

585

Jacob Portes@JacobianNeuro·27 Kas

@ilyasut says the age of scaling is over - good thing we put this paper out in time! Many recent embedding models are finetuned versions of pretrained LLMs. We asked 🤓: How does retrieval performance scale with pretraining FLOPs? 📄 paper: arxiv.org/abs/2508.17400

English

11.9K

John Trengrove@trengrj·27 Kas

At least job ads asking for 10+ years of grep experience won't be so annoying

Antoine Chaffin@antoine_chaffin

me: no but RAG can be done with plain old exact matching grep, it's totally fine @mixedbreadai: rebrand search as grep me: alright, then I guess I am doing grep now

English

185

John Trengrove@trengrj·26 Kas

@Yuchenj_UW Yann in his arguments stressed lack of physical world model and training from sensory data while Ilya focused more on LLMs not being coherent (flip-flopping, whale-a-mole errors etc) which I think makes more sense to people using LLMs day-to-day. Also timing is important.

English

793

Yuchen Jin@Yuchenj_UW·26 Kas

Genuine question: Why is there a double standard between Ilya and Yann?

English

188

348.4K

John Trengrove@trengrj·24 Kas

Please see this notebook for details gist.github.com/trengrj/e2aaec….

English

124

John Trengrove@trengrj·24 Kas

x.com/i/article/1992…

ZXX

1.7K

John Trengrove@trengrj·7 Kas

Link here E2Rank: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker alibaba-nlp.github.io/E2Rank/ by @qiliu6777

English

John Trengrove@trengrj·7 Kas

The idea that an embedding model can also be a effective reranker (E2Rank) opens up a lot of interesting opportunities especially for local usage where you don't want to run multiple models.

English

John Trengrove@trengrj·25 Eki

@_weiping Do you still plan to release the weights for RankRAG?

English

Wei Ping@_weiping·3 Tem

Introducing RankRAG, a novel RAG framework that instruction-tunes a single LLM for the dual purposes of top-k context ranking and answer generation in RAG. For context ranking, it performs exceptionally well by incorporating a small fraction of ranking data into the training blend, surpassing existing expert ranking models, including the same LLM exclusively fine-tuned on a large amount of ranking data. For retrieval-augmented generation, the RankRAG models significantly outperform GPT-4-0613, GPT-4-turbo-2024-0409, and ChatQA-1.5 on nine knowledge-intensive benchmarks. It has also demonstrated superb capability for generalization to new domains, such as biomedical tasks. Paper: arxiv.org/abs/2407.02485 Model weights: will be released later.

English

151

29.9K

John Trengrove@trengrj·5 Eyl

@jxnlco @jxnlco I was actually watching one of your videos with around 200 views last week and thought the recommendations algorithm was broken.. Keep going the content is excellent.

English

jason@jxnlco·4 Eyl

youtube is really hard guys

English

144

14.6K

John Trengrove@trengrj·29 Ağu

@atulit_gaur If they talk about powers of 2 that is one thing, but if they talk about the history of embedding models, impact of BERT, and the hidden size of BERT-base being 768 and BERT-large being 1024 you have found a gem.

English

1.3K

75.7K

atulit@atulit_gaur·28 Ağu

Fun question to ask in an ml interview, “Why do embedding dimensions come in neat sizes like 768 or 1024, but never 739?” If they can't answer it, it's fine but if they do, you've stumbled upon a real gem.

English

140

4.6K

932.2K

John Trengrove@trengrj·28 Ağu

Really excited to share this! We set out to develop a form of quantization at Weaviate with the following goals: 1. No training requirements like product quantization. 2. Better recall that scalar or binary quantization. 3. Something we could unreservedly recommend as a better default for most users. Rotational Quantization is our take on RaBitQ extended but with significant enhancements including using fast Walsh-Hadamard transforms to speed up rotations. Compared to using no quantization, import times improve by up to 50%, there is a significant memory reduction (vectors compressed 4x), and very little impact on recall 🚀.

Weaviate AI Database@weaviate_io

New in Weaviate 1.32+: 8-bit Rotational Quantization! Compress your vectors by 4x while simultaneously improving speed AND quality - yes, it's actually better than uncompressed vectors. But how does it work? Random rotations make every vector perfectly suited for quantization by smoothing entries and redistributing similarity information across all dimensions. This universal approach can handle any dataset without training. Results speak for themselves: ✨ 4x memory reduction ✨ 15-50% faster throughput ✨ Near-perfect recall maintained Check out the full technical deep-dive: weaviate.io/blog/8-bit-rot…

English

2.9K

Keşfet

@CJHandmer @lateinteraction @andersonbcdefg @bclavie @weaviate_io @minishlab @copyconstruct @JacobianNeuro