jonah
1.6K posts



I'm so excited to introduce this! We've worked on a million different moving parts to produce this. I'm fairly confident it's the best multimodal model that exists, period -- and it's not too shabby at pushing back the LIMITs of retrieval either...

Introducing Mixedbread Wholembed v3, our new SOTA retrieval model across all modalities and 100+ languages. Wholembed v3 brings best-in-class search to text, audio, images, PDFs, videos... You can now get the best retrieval performance on your data, no matter its format.


a significant % of ml researchers might be hooked by what happened in ONE day. ai seems to be doing a research loop fascinatingly well (understand the problem + propose a change + train/test it + measure results + keep the better version + repeat) and genuinely reducing research friction. we are early to automated experimentation, frontier scale could be an interesting watch.


Wow. It’s absolutely preposterous that ColBERTv2, a 100M parameter retriever, still fricking outperforms Qwen3-Embed-8B, an 80x bigger dense retriever. ColBERTv2 was trained by one dude in 2021 on 4 A100s for 4 days, on top of puny BERT-base. Single-vector models hold IR back.


wait, no ones gonna talk about how shit openclaw's code is?


none of these racist comments are going to stop me from enjoying a steak and red wine valentine’s day dinner with my extremely attractive (even by korean standards) korean girlfriend who i love very much and maybe if you all took my advice you could find this happiness too


openai cafe might have the best slice in sf



Huge chance for native “select text to explain and put the rest of the paper in context” tiny tool that would go extremely hard if done well


Multi-vector embeddings (ColBERT, ColPali) are budget killers. But MUVERA can cut your memory footprint by 70%. Multi-vector models offer incredible retrieval but suffer from massive memory overhead and slow indexing. MUVERA (Multi-Vector Retrieval via Fixed Dimensional Encodings) compresses these into single, fixed-dimensional vectors. How it works: MUVERA condenses a sequence of vectors (e.g., 100x96d) into one vector via: 1️⃣ Space Partitioning: Groups vectors into buckets using SimHash or k-means clustering. 2️⃣ Dimensionality Reduction: Applies random linear projection to compress each sub-vector while preserving dot products. 3️⃣ Repetitions: Repeats the process multiple times and concatenates results to improve accuracy. 4️⃣ Final Projection: Optional final compression (not used in Weaviate's implementation). The impact (LoTTE benchmark): - Memory: 12GB → <1GB. - Indexing: 20+ mins → 3-6 mins. - HNSW Graph: 99% smaller. There’s a trade-off: You trade a slight dip in raw recall for massive efficiency gains. However, by tuning the HNSW `ef` parameter (e.g., `ef=512`), you can recover 80-90%+ recall while keeping costs low. When should you use MUVERA? → Large-scale production RAG → Systems where memory/infrastructure costs are the direct bottleneck → Use cases requiring fast indexing MUVERA in @weaviate_io 1.31+ takes just a couple of lines of code. You can tune three parameters (k_sim, d_proj, r_reps) to balance memory usage and retrieval accuracy for your specific use case. Read the full technical deep-dive here: weaviate.io/blog/muvera?ut…




DeepSeek OCR 2. First reaction: it uses Qwen2-0.5B. Qwen 2 came out in May-Jun 2024. If they started this work ≤1 year ago, they'd have used 2.5 at least (July 2024). To me this confirms that OCR series have been done in DeepSeek V2 era. It's uncanny.










