Herumb Shandilya 🦀

1.7K posts

Herumb Shandilya 🦀 banner
Herumb Shandilya 🦀

Herumb Shandilya 🦀

@krypticmouse

Research @ScalingIntelLab @HazyResearch | Incoming Research Engineer @mixedbreadai | Building DSRs | MSCS, ColBERT, DSPy @Stanford

Stanford Katılım Aralık 2013
512 Takip Edilen2.4K Takipçiler
Sabitlenmiş Tweet
Herumb Shandilya 🦀
Herumb Shandilya 🦀@krypticmouse·
DSRs, @DSPyOSS for Rust is here🚀 Happy to finally announce the stable release of DSRs. Over the past few months, I’ve been building DSRs with incredible support and contributions from folks Maguire Papay, @tech_optimist, and @joshmo_dev. A big shout out to @lateinteraction and @ChenMoneyQ who were the first people to hear my frequent rants on this!! Couldn't have done this without all of them. DSRs originally started as a passion project to explore true compilation and as it progressed I saw it becoming more. I can’t wait to see what the community builds with it. DSRs is a 3 phase project: 1. API Stabilization. We are nearly done with this and it was mostly implementing the API design. We kept the DSPy style in mind and tried to keep it close to it so it's easier to onboard and while at it we tried to improve it and make it a bit more idiomatic and intuitive! 2. Performance Optimisation with benchmarking vs DSPy. We want to benchmark LLMs performance vs DSPy, with API design finalized we want to improve performance in every front. We'll improve the latency and improve the templates and optimizers in DSRs. 3. True Module Compilation. Why should you optimize signature when you can optimize and fuse much more? This is the idea of the final phase of DSRs. A true LLM workflow compiler. More on this after Phase 2. Really grateful for @PrimeIntellect offering compute to drive Phase 2 and 3 experimentation for this! Big shoutout to them and @johannes_hage for this!!! But what is DSRs? What does it offer? Let's see.
Herumb Shandilya 🦀 tweet media
English
12
24
212
39.5K
Herumb Shandilya 🦀
Herumb Shandilya 🦀@krypticmouse·
Some string compaction and unsafe code(might dump this)...
Herumb Shandilya 🦀 tweet media
English
0
0
2
45
Herumb Shandilya 🦀
Herumb Shandilya 🦀@krypticmouse·
Kinda amazing what a simple data structure switch can do 🙂
Herumb Shandilya 🦀 tweet media
English
1
0
10
500
Swayam Singh
Swayam Singh@swayaminsync·
Developing Benchmarks: A First-Time Parent's Guide 1️⃣ Think and create panics to log for all things that can go wrong during a run (out-of-context, invalid parsing, no-response, etc) 2️⃣ If possible make the setup to be able to run concurrently with multi-threads/processes 3️⃣ Implement checkpointing to resume a left-off run 4️⃣ Pin every dependency version, model checkpoint hash, and random seed 5️⃣ Log token counts (input/output) per sample 6️⃣ Log all the events in a file (every single one) 7️⃣ Define a retry policy with exponential backoff for transient failures
English
2
0
6
172
Herumb Shandilya 🦀 retweetledi
Joel Dierkes
Joel Dierkes@joeldierkes·
Mixedbread just made 115h of videos accessible to my agent. With the new @mixedbreadai v3 release, you can upload any video to your Mixedbread store and make its content accessible to your agent.
English
6
5
27
3.2K
Dhravya Shah
Dhravya Shah@DhravyaShah·
BTW this is exactly what's going on in the memory / retrieval space. Everyone's freakin lying we are trying to fix it with memorybench
Ara@arafatkatze

Turns out @openblocklabs is a complete fraud who gamed their Terminal bench SOTA score. They cheated by putting the result verifier values INSIDE the binary before running the eval and then publicly reported that score as their SOTA score. Read the breakdown here

English
13
2
132
17.8K
Ben Clavié
Ben Clavié@bclavie·
I'm so excited to introduce this! We've worked on a million different moving parts to produce this. I'm fairly confident it's the best multimodal model that exists, period -- and it's not too shabby at pushing back the LIMITs of retrieval either...
Mixedbread@mixedbreadai

Introducing Mixedbread Wholembed v3, our new SOTA retrieval model across all modalities and 100+ languages. Wholembed v3 brings best-in-class search to text, audio, images, PDFs, videos... You can now get the best retrieval performance on your data, no matter its format.

English
37
41
410
138.9K
Herumb Shandilya 🦀
Herumb Shandilya 🦀@krypticmouse·
Had the most fun I’ve had working on OpenJarvis, learned so much from this project! So happy it’s finally public! Give it a try and let us know any feedbacks you have!
Jon Saad-Falcon@JonSaadFalcon

Personal AI should run on your personal devices. So, we built OpenJarvis: a personal AI that lives, learns, and works on-device. Try it today and top the OpenJarvis Leaderboard for a chance to win a Mac Mini! Collab w/ @Avanika15, John Hennessy, @HazyResearch, and @Azaliamirh. Details in thread.

English
0
2
9
796
Herumb Shandilya 🦀 retweetledi
Mixedbread
Mixedbread@mixedbreadai·
Introducing Mixedbread Wholembed v3, our new SOTA retrieval model across all modalities and 100+ languages. Wholembed v3 brings best-in-class search to text, audio, images, PDFs, videos... You can now get the best retrieval performance on your data, no matter its format.
Mixedbread tweet media
English
35
121
946
186.7K
Aamir
Aamir@aaxsh18·
too many people claiming sota these days...
English
3
0
14
813
Herumb Shandilya 🦀
Herumb Shandilya 🦀@krypticmouse·
Idk much about vague posting but... `synapse apply examples/supermemory.mnm` `synapse apply examples/zep.mnm` `synapse apply examples/letta.mnm` Memory is Retrieval. soon.
Herumb Shandilya 🦀 tweet media
English
1
0
8
266
Omar Khattab
Omar Khattab@lateinteraction·
judging by emails, seemingly every other lab is trying to hire leads for their search teams now; it kind of feels late for that?
English
15
3
168
29.3K
Drew Breunig
Drew Breunig@dbreunig·
On March 18th, we're hosting another Bay Area DSPy Meetup featuring in-production case studies involving GEPA, tool use, and LLM judges from Dropbox and Shopify. (And we'll talk RLMs, too.) Join us! luma.com/je6ewmkx
English
5
4
31
5.7K
Herumb Shandilya 🦀 retweetledi
Jon Saad-Falcon
Jon Saad-Falcon@JonSaadFalcon·
With intelligence-per-watt (IPW), we propose a unified metric for measuring intelligence efficiency, capturing both the LM capabilities delivered and the energy required to power the AI stack, enabling a better understanding of how we scale local and cloud LLMs. Honored to be part of Slingshots // TWO! It's been a blast working with @LaudeInstitute on the IPW project. Big thanks to @andykonwinski @bradenjhancock @ChrisRytting and the whole Laude team for all the support!
Laude Institute@LaudeInstitute

Intelligence-Per-Watt/@JonSaadFalcon @Avanika15 John Hennessy @hazyresearch @Azaliamirh (@Stanford) - Most queries don't need frontier-model horsepower. This work makes "use the right model for the job" a measurable strategy, quantifying when smaller local models can match frontier quality while cutting energy, cost, and compute.

English
3
6
30
3.5K