John Trengrove

234 posts

John Trengrove banner
John Trengrove

John Trengrove

@trengrj

Applied Research @weaviate_io

Australia Katılım Haziran 2013
857 Takip Edilen339 Takipçiler
John Trengrove
John Trengrove@trengrj·
@CJHandmer This seems needlessly alarmist when Australia spends roughly half the US on healthcare per capita and robotics in aged care will be transformative.
English
7
0
25
15.4K
Casey Handmer
Casey Handmer@CJHandmer·
When I was in Australia last December, I found it impossible to transact with any business where there weren't latent government price controls or subsidies of some kind. I came to the realization that with the public sector growing about 5x faster than the private sector, Australia was well on the way to an effectively government run economy, communism by stealth. I dug deeper - the point of no return occurred in about 2013. caseyhandmer.wordpress.com/2026/04/16/aus…
English
124
367
2.3K
320.1K
John Trengrove
John Trengrove@trengrj·
@lateinteraction As an example look at this table on storage efficiency from Nemotron ColEmbed V2. The single vector model uses nearly 1000x less storage. Compression of residuals etc means the factor is less in practice but there is so much efficiency left on the table.
John Trengrove tweet media
English
1
0
0
89
John Trengrove
John Trengrove@trengrj·
Great LIR keynote by @lateinteraction and 100% this on the need for less vectors per token in these models.
John Trengrove tweet media
English
1
0
3
115
John Trengrove
John Trengrove@trengrj·
@andersonbcdefg Haha they should sell a recursive stack of crawling and anti-crawling services with continually increasing prices.
English
1
0
38
8.3K
John Trengrove
John Trengrove@trengrj·
Hey @bclavie, we primarily were showing memory reduction (80%+) and QPS improvements with the set of MUVERA parameters in the blog post. I agree sometimes you would want to increase projections for less memory reduction and higher recall. MUVERA demonstrated in the paper it out-performed PLAID in latency controlling for a target recall level. Some datasets achieved 99.8%+ recall some achieved (i.e. SCIDOCs with 57%) much lower. Especially with the more complex MaxSim distance calculation you don't always achieve great ANN index recall with multi-vector models. Personally I think MUVERA will win over PLAID style approaches for ecosystem reasons. With MUVERA you can adapt any ANN index (HNSW, DiskANN, ScaNN, Faiss, etc) with just the FDE encoding and MaxSim implementation (leveraging everything already done in this area) while PLAID requires a complex 4-stage pruning pipeline.
English
2
0
5
144
Ben Clavié
Ben Clavié@bclavie·
I'm personally very bearish on MUVERA due to its many, many, failure cases, but I have a lot of respect for the @weaviate_io folks, so I gave this another deep read to see if there were things that could change my mind. However this has me a bit puzzled, if I'm reading the graph below right, it means that MUVERA itself produces a ~50+% incompressible performance degradation at commonly used indexing parameters, and still a ~20% degradation at near-bruteforce search tier parameters (ef=1024), meaning that the degradation would be purely due to MUVERA itself. For most retrieval uses, this would make the method completely unusable, as this degradation for many workflows is almost similar to the one we'd experience from replacing semantic search with pure bm25/keyword search. I feel like I'm missing something here so I'm very happy to be corrected if I'm misinterpreting the results!
Ben Clavié tweet media
Femke Plantinga@femke_plantinga

Multi-vector embeddings (ColBERT, ColPali) are budget killers. But MUVERA can cut your memory footprint by 70%. Multi-vector models offer incredible retrieval but suffer from massive memory overhead and slow indexing. MUVERA (Multi-Vector Retrieval via Fixed Dimensional Encodings) compresses these into single, fixed-dimensional vectors. How it works: MUVERA condenses a sequence of vectors (e.g., 100x96d) into one vector via: 1️⃣ Space Partitioning: Groups vectors into buckets using SimHash or k-means clustering. 2️⃣ Dimensionality Reduction: Applies random linear projection to compress each sub-vector while preserving dot products. 3️⃣ Repetitions: Repeats the process multiple times and concatenates results to improve accuracy. 4️⃣ Final Projection: Optional final compression (not used in Weaviate's implementation). The impact (LoTTE benchmark): - Memory: 12GB → <1GB. - Indexing: 20+ mins → 3-6 mins. - HNSW Graph: 99% smaller. There’s a trade-off: You trade a slight dip in raw recall for massive efficiency gains. However, by tuning the HNSW `ef` parameter (e.g., `ef=512`), you can recover 80-90%+ recall while keeping costs low. When should you use MUVERA? → Large-scale production RAG → Systems where memory/infrastructure costs are the direct bottleneck → Use cases requiring fast indexing MUVERA in @weaviate_io 1.31+ takes just a couple of lines of code. You can tune three parameters (k_sim, d_proj, r_reps) to balance memory usage and retrieval accuracy for your specific use case. Read the full technical deep-dive here: weaviate.io/blog/muvera?ut…

English
5
3
45
4.5K
John Trengrove
John Trengrove@trengrj·
I really like LEANN but it does require embedding chunks on the fly for each query (for each batch of distance calculations as you navigate the graph). For most use-cases the costs (in API calls, time, or GPUs) of creating embeddings is high enough you don't want to do this. However when paired with static embeddings (like with @minishlab's model2vec) LEANN could be a good solution for storage constrained workloads.
Lior Alexander@LiorOnAI

Stop storing embeddings. A laptop can now index 60 million text chunks using 6GB, not 200GB. LEANN, a new open-source project flips how vector search works. 𝗧𝗵𝗶𝘀 𝗶𝗻𝗱𝗲𝘅 𝗱𝗼𝗲𝘀 𝗻𝗼𝘁 𝘀𝘁𝗼𝗿𝗲 𝗲𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴𝘀 Instead of saving every vector, it stores a compact graph. Embeddings get recomputed only when a query actually needs them. • Graph-based selective recomputation • High-degree node pruning to keep recall stable • No accuracy drop versus FAISS-style indexes 𝗧𝗵𝗲 𝘀𝘁𝗼𝗿𝗮𝗴𝗲 𝗴𝗮𝗶𝗻𝘀 𝗮𝗿𝗲 𝗺𝗮𝘀𝘀𝗶𝘃𝗲 Email archives shrink from gigabytes to megabytes. Browser history fits in single-digit MBs. A 60M-document corpus fits on a laptop SSD. 𝗜𝘁 𝗿𝘂𝗻𝘀 𝗲𝗻𝘁𝗶𝗿𝗲𝗹𝘆 𝗹𝗼𝗰𝗮𝗹 No cloud calls. No telemetry. Everything stays on-device with zero ongoing cost. 𝗜𝘁 𝘂𝗻𝗹𝗼𝗰𝗸𝘀 𝗹𝗼𝗰𝗮𝗹 𝗥𝗔𝗚 𝗮𝘁 𝗻𝗲𝘄 𝘀𝗰𝗮𝗹𝗲 You can semantically search files, emails, chats, codebases, and live MCP sources. All from one local index, without changing your workflow.

English
1
0
2
428
John Trengrove
John Trengrove@trengrj·
@copyconstruct I think they are still being built but people aren't hearing about them. The build vs buy pendulum has swung heavily to build with AI so you now have 50 vibe coded file systems etc with few users vs a category winner.
English
0
0
3
144
Cindy Sridharan
Cindy Sridharan@copyconstruct·
I’m genuinely curious about this. It’s easier than ever to whip up prototypes. Why *hasn’t* there been a Cambrian explosion of new file systems, storage engines, security protocols, orchestrators etc? Are these now “boring” problems no one wants to solve anymore?
English
10
4
60
15.4K
Cindy Sridharan
Cindy Sridharan@copyconstruct·
The 2013-2015 years gave us groundbreaking tools like Docker, Kubernetes, Prometheus etc. We’re in the age of AI-turbocharged productivity, and in the last 3 years, we haven’t seen many similar tools developed or open sourced. You’d think solving “hard” problems became easier?
English
10
6
90
9.9K
John Trengrove
John Trengrove@trengrj·
@JacobianNeuro Really interesting! Do you think the scaling improvements would hold keeping a fixed output dimension for all models?
English
1
0
3
44
Jacob Portes
Jacob Portes@JacobianNeuro·
We benchmarked retrieval performance across LLM model sizes from 125M - 7B parameters pretrained on 1B - 2T tokens. By varying the token parameter ratio (TPR) we show that: - Retrieval scales with pretraining FLOPs (surprise!) - Retrieval and ICL scores are strongly correlated
Jacob Portes tweet media
English
2
1
6
585
Jacob Portes
Jacob Portes@JacobianNeuro·
@ilyasut says the age of scaling is over - good thing we put this paper out in time! Many recent embedding models are finetuned versions of pretrained LLMs. We asked 🤓: How does retrieval performance scale with pretraining FLOPs? 📄 paper: arxiv.org/abs/2508.17400
English
3
10
59
11.9K
John Trengrove
John Trengrove@trengrj·
@Yuchenj_UW Yann in his arguments stressed lack of physical world model and training from sensory data while Ilya focused more on LLMs not being coherent (flip-flopping, whale-a-mole errors etc) which I think makes more sense to people using LLMs day-to-day. Also timing is important.
English
0
0
1
793
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
Genuine question: Why is there a double standard between Ilya and Yann?
Yuchen Jin tweet media
English
188
31
1K
348.4K
John Trengrove
John Trengrove@trengrj·
The idea that an embedding model can also be a effective reranker (E2Rank) opens up a lot of interesting opportunities especially for local usage where you don't want to run multiple models.
John Trengrove tweet media
English
1
0
1
74
Wei Ping
Wei Ping@_weiping·
Introducing RankRAG, a novel RAG framework that instruction-tunes a single LLM for the dual purposes of top-k context ranking and answer generation in RAG. For context ranking, it performs exceptionally well by incorporating a small fraction of ranking data into the training blend, surpassing existing expert ranking models, including the same LLM exclusively fine-tuned on a large amount of ranking data. For retrieval-augmented generation, the RankRAG models significantly outperform GPT-4-0613, GPT-4-turbo-2024-0409, and ChatQA-1.5 on nine knowledge-intensive benchmarks. It has also demonstrated superb capability for generalization to new domains, such as biomedical tasks. Paper: arxiv.org/abs/2407.02485 Model weights: will be released later.
Wei Ping tweet mediaWei Ping tweet mediaWei Ping tweet mediaWei Ping tweet media
English
4
40
151
29.9K
John Trengrove
John Trengrove@trengrj·
@jxnlco @jxnlco I was actually watching one of your videos with around 200 views last week and thought the recommendations algorithm was broken.. Keep going the content is excellent.
English
0
0
0
14
jason
jason@jxnlco·
youtube is really hard guys
jason tweet media
English
30
0
144
14.6K
John Trengrove
John Trengrove@trengrj·
@atulit_gaur If they talk about powers of 2 that is one thing, but if they talk about the history of embedding models, impact of BERT, and the hidden size of BERT-base being 768 and BERT-large being 1024 you have found a gem.
English
16
14
1.3K
75.7K
atulit
atulit@atulit_gaur·
Fun question to ask in an ml interview, “Why do embedding dimensions come in neat sizes like 768 or 1024, but never 739?” If they can't answer it, it's fine but if they do, you've stumbled upon a real gem.
English
140
85
4.6K
932.2K
John Trengrove
John Trengrove@trengrj·
Really excited to share this! We set out to develop a form of quantization at Weaviate with the following goals: 1. No training requirements like product quantization. 2. Better recall that scalar or binary quantization. 3. Something we could unreservedly recommend as a better default for most users. Rotational Quantization is our take on RaBitQ extended but with significant enhancements including using fast Walsh-Hadamard transforms to speed up rotations. Compared to using no quantization, import times improve by up to 50%, there is a significant memory reduction (vectors compressed 4x), and very little impact on recall 🚀.
Weaviate AI Database@weaviate_io

New in Weaviate 1.32+: 8-bit Rotational Quantization! Compress your vectors by 4x while simultaneously improving speed AND quality - yes, it's actually better than uncompressed vectors. But how does it work? Random rotations make every vector perfectly suited for quantization by smoothing entries and redistributing similarity information across all dimensions. This universal approach can handle any dataset without training. Results speak for themselves: ✨ 4x memory reduction ✨ 15-50% faster throughput ✨ Near-perfect recall maintained Check out the full technical deep-dive: weaviate.io/blog/8-bit-rot…

English
1
3
16
2.9K