Vhiz

306 posts

Vhiz

@ThereBeLyte

Beigetreten Nisan 2018

523 Folgt22 Follower

Vhiz@ThereBeLyte·14 Eki

@victormustar @togethercompute - please put this one behind your API

English

Victor M@victormustar·9 Eki

Microsoft did something interesting here 👀 “Unlike typical LLMs that are trained to play the role of the "assistant" in conversation, we trained UserLM-8b to simulate the “user” role in conversation” huggingface.co/microsoft/User…

English

168

1.8K

190.5K

Vhiz@ThereBeLyte·10 Eki

@i_am_brennan @gunta85 how do i define a pipeline here? and do i get to see the underlying dspy code?

English

Brennan McEachran 👨‍🚀@i_am_brennan·9 Eki

@gunta85 Also check out gepazilla.com if you want an OSS ui to play with GEPA

English

471

Günther | グンタ@gunta85·9 Eki

I award GEPA the “Most Under-Hyped Paper of the Year” prize. 🏆 Why isn't everyone talking about this? We might need an amazing logo and landing page. Get your AI project GEPA™️ certified. github.com/gepa-ai/gepa

Lakshya A Agrawal@LakshyAAAgrawal

What's stopping you from trying GEPA right now? P.S.: Please go try GEPA!🥹 x.com/tobi/status/19…

English

9.1K

Vhiz@ThereBeLyte·28 Eyl

@samsja19 @QuanquanGu Can you please point to resources which talk about sota grpo

English

123

samsja@samsja19·28 Eyl

@QuanquanGu I would call this a minor issue, the whole idea behind group advantage is still very powerful. Also sota "grpo" has evolved a lot since then and it's not that much an off policy algorithm anymore

English

724

Quanquan Gu@QuanquanGu·28 Eyl

Agreed. GRPO is technically wrong.

English

371

139.6K

Vhiz@ThereBeLyte·19 Eyl

@athleticKoder Nice thread Anshuman. One question though: metrics like contextual recall would require the ground truth to be present. So how do you do async batch eval of prod traffic?

English

anshuman@athleticKoder·18 Eyl

he question that ends the interview: "How would you implement this evaluation in production?" Wrong: "Run tests manually" Right: - Automated component-level evals in CI/CD - real-time monitoring with alerting - async batch evaluation of production traffic"

English

7.5K

anshuman@athleticKoder·18 Eyl

You're in a ML Engineer interview at Perplexity, and the interviewer asks: "Your RAG system is hallucinating in production. How do you diagnose what's broken - the retriever or the generator?" Here's how you can answer:

English

216

2.9K

360.6K

Vhiz@ThereBeLyte·7 Eyl

@NirantK Hey Nirant, can you please explain the benefits of using DSPy without the optimizers?

English

Nirant@NirantK·6 Eyl

Go use DSPy Signatures and Modules! It's useful without the optimizers and most of my production DSPy usage is without optimizers

Pratik Desai@chheplo

If you’re seeing DSPy, Evals, GEPA etc a lot on your timeline, you’re following too many AI consultants and influencers.

English

1.1K

Vhiz@ThereBeLyte·25 Ağu

@dbreunig How does one handle edge cases using DSPy? Add them to eval?

English

Drew Breunig@dbreunig·23 Ağu

I got around to kicking the tires on GEPA prompt optimization in DSPy, seeing if it could match the reported gsm8k benchmark for Qwen3-4b-thinking. Started with the simplest signature: qa_bot = dspy.Predict('question -> answer') GEPA got it from 67.2% to 92.8%.

English

237

25.4K

Vhiz retweetet

Yam Marcovic@ymarcov·16 Ağu

𝙿̲𝚊̲𝚛̲𝚕̲𝚊̲𝚗̲𝚝̲ 𝟹̲.𝟶̲ 𝚒̲𝚜̲ 𝚘̲𝚏̲𝚏̲𝚒̲𝚌̲𝚒̲𝚊̲𝚕̲𝚕̲𝚢̲ 𝚛̲𝚎̲𝚕̲𝚎̲𝚊̲𝚜̲𝚎̲𝚍̲! Proud to announce our most significant framework overhaul since its inception. See what it's all about at Parlant(io). This version transforms Parlant into a production-ready conversational AI framework for customer-facing applications. With dramatic performance improvements, reworked developer experience, and enterprise-grade security features, Parlant 3.0 is ready to fix your hardest AI consistency issues and power your most critical customer-facing applications. "𝘉𝘺 𝘧𝘢𝘳 𝘵𝘩𝘦 𝘮𝘰𝘴𝘵 𝘦𝘭𝘦𝘨𝘢𝘯𝘵 𝘤𝘰𝘯𝘷𝘦𝘳𝘴𝘢𝘵𝘪𝘰𝘯𝘢𝘭 𝘈𝘐 𝘧𝘳𝘢𝘮𝘦𝘸𝘰𝘳𝘬 𝘵𝘩𝘢𝘵 𝘐 𝘩𝘢𝘷𝘦 𝘤𝘰𝘮𝘦 𝘢𝘤𝘳𝘰𝘴𝘴! 𝘋𝘦𝘷𝘦𝘭𝘰𝘱𝘪𝘯𝘨 𝘶𝘴𝘪𝘯𝘨 𝘗𝘢𝘳𝘭𝘢𝘯𝘵 𝘪𝘴 𝘱𝘶𝘳𝘦 𝘫𝘰𝘺!” — Vishal Ahuja, JPMorgan Chase I know many have awaited the improvements in this new release — and many have helped test it. We do it all for our users, as we've become obsessed with the massive challenge of getting customer-facing AI agents under control. Parlant (@EmcieCo) is continuing to lead this endeavor. If you haven't tried it yet, you should, as it's like nothing you've already tried. Check out the blog below for a summary of the release and what's coming next (1st comment)

English

190

2.6K

Vhiz@ThereBeLyte·11 Ağu

@SkodaIndia - 14k kms and the clutch is not working. Skoda Kushaq top-end model..And the dealer is charging 36k. I haven't heard of clutch having to replaced after just 14k kms. This is unacceptable

English

Vhiz@ThereBeLyte·10 Ağu

@NirantK I tried joining but it didn’t work

English

Nirant@NirantK·9 Ağu

I post about niche LLM and Search things on my personal WhatsApp channel in case you want to get it even without depending on the Twitter algorithm! whatsapp.com/channel/0029Va…

English

13.7K

Nirant@NirantK·9 Ağu

Apple just dropped a killer open-source visualization tool for embeddings — Embedding Atlas — and it’s surprisingly powerful for anyone working with large text+metadata datasets. This reminds me of Nomic's Atlas, but I never got around to using it 😅 We’re talking real-time search, multi-million point rendering, and automatic clustering with labels. One of their showcase examples visualizes ~200K wine reviews using embeddings + metadata like price, country, and tasting notes. And it is lightning fast even on my browser! No separate code needed! It nails what most LLM devs need but often hack together: ✅ UMAP projections ✅ Faceted search across metadata (e.g. “country vs. price”) ✅ Hover + tooltip on raw points ✅ Interactive filters, histograms, and cluster overlays ✅ Cross-linked scatterplot + table views Under the hood: • Fast rendering using WebGPU (with WebGL fallback) • Embedding-based semantic similarity search • Kernel density contours for spotting clusters or outliers You just upload your .jsonl or .csv with text + vector + metadata. It handles the rest: clustering, labeling, UI layout, everything. This feels like the LLM-native version of Tableau — but optimized for text, chat and modern data needs If you’re building RAG evals, search tuning, clustering explainability, or even dataset audits — this could be your new favorite tool.

English

411

4.2K

433.2K

Vhiz@ThereBeLyte·28 Tem

@MaximeRivest Can you please share the code

English

357

Maxime Rivest 🧙‍♂️🦙🐧@MaximeRivest·28 Tem

I love DSPy! In about 3 hours of coding and running optimizers, I have now found 3 optimized prompts (for three classifiers) that produce 99% of the time the same classification as a majority vote classification from kimi-k2, llama-4-maverick and qwen3-235b!! But with Gemma 3n E4B which you can run on a laptop. Pretty nice! First trials for models this size without optimizations of prompt + model + task were at ~50%! Each of the 3 big teacher models on their own were not even producing those results.

English

240

19K

Vhiz@ThereBeLyte·28 Tem

@MaximeRivest can you please share the notebook/code by any chance?

English

389

Vhiz@ThereBeLyte·22 Haz

@vivekkalyansk Excellent writeup! Keep them coming!

English

Vivek Kalyan@vivekkalyansk·20 Haz

RAG is plagued by teams approaching it as a engineering problem: evaluating chunking strategies, vector databases, prompting. I wrote in-depth about this here: vivekkalyan.com/writing/scalin…

English

632

Vivek Kalyan@vivekkalyansk·20 Haz

I presented at MLSG yesterday: How I trained a legal search agent that outperforms o3 using RL on a single H100 in 9 hours. I believe this is a shift in how we build AI Agents

English

6.2K

Vhiz@ThereBeLyte·14 Haz

@willccbb @corbtt @MavenHQ Sent a DM @willccbb. Please check

English

will brown@willccbb·27 May

i'm teaming up with @corbtt from openpipe to teach a class about agents + RL :) we'll be teaching the class on @MavenHQ starting june 16. as far as we know, this is the first course of its kind anywhere to bridge RL + LLM agents, and we’re really excited to share some of our favorite tips and tricks with you all. both kyle and i maintain toolkits -- ART and verifiers -- for working with agents + RL, and the course will include coverage of both in different capacities, along with other popular agent tools (MCP, smolagents, etc). the goal is that you come away from the course feeling fully equipped to train your own custom agent models with RL. the tagline pricing for the course is targeted at industry professionals. part of this is about managing bandwidth for facetime—the course will be highly interactive, and we want to ensure we can accommodate everyone’s questions. however, we also want this to be very broadly accessible to anyone who stands to gain from a deep dive into the material. while we want everyone in the course to have at least some "skin in the game" to incentivize active participation + attendance, we are more than happy to give very steep discounts, particularly to current students / self-funded founders / independent researchers. email me at will@[…] with subject line “Agentic RL Course” and some info about your background and i’ll get you a code. as a preview, i’m teaching a free lightning lesson this thursday, may 29 (3PM PT) about building your own "deep research"-style search agent. we’ll be doing a couple of these in the lead-up to the course if you’re on the fence, or just want a smaller dose of content. the course will be a useful structure for creating educational materials (notes, notebooks, etc.) that can be shared more broadly, for free -- maven's approach towards letting instructors own their materials while also offering really great course-hosting infrastructure was a key part of why we decided to work with them. more info below! see you in class 🤓

English

474

89.4K

Vhiz@ThereBeLyte·1 May

@_rchaves_ Please do take a look at parlant.io

English

Rogerio Chaves@_rchaves_·22 Nis

I'm still expanding on create-agent-app, so far I built the same customer service agent in 6 different agent frameworks, very simple agent but it was already enough to have a good feeling of each framework dev experience let me share my experiences so far 🧵

English

477

89.6K

Vhiz@ThereBeLyte·20 Nis

@dexhorthy take a look at @EmcieCo's parlant.io

English

dex@dexhorthy·18 Nis

may have hit a nerve here...it all started with trying to understand how production AI systems actually work FOLKS - I've tried every agent framework out there and talked to many strong founders building impressive things with AI BUT I was surprised to find that most successful AI systems aren't following the "here's your prompt, here's a bag of tools" pattern ► they're mostly just well-engineered software with LLM capabilities integrated at key points The companies shipping high-quality AI aren't building monolithic "agents" from scratch ► they're incorporating small, focused llm loops that do one thing well so I set out to document the principles for building production-grade LLM applications in "12-factor-Agents" We were on the front page of HN all day on wednesday, with great discussion from the community check it out below 👇 and let me know what factors you think are missing

English

1.2K

177.5K

Vhiz@ThereBeLyte·15 Şub

@KyleMorgenstein @inceptmyth Can you suggest some concrete resources please

English

Kyle🤖🚀🦭@KyleMorgenstein·13 Şub

@inceptmyth absolutely, this list was just for people that are trying to speed run policy optimization to play with GRPO, this list is obviously not comprehensive of all RL.

English

632

Kyle🤖🚀🦭@KyleMorgenstein·13 Şub

if you start with GRPO you’re cooked if you want to understand RL*, start with the policy gradient theorem. then natural policy gradient, generalized advantage estimation, trust region policy optimization, proximal policy optimization, and then group relative policy optimization

xjdr@_xjdr

If I were you I'd be studying either RL (starting with GRPO) or PTX (starting with cuda). If I were much younger me I'd be studying my ass off in both subjects plus MuZero and training 0.5B models every day on my 4090

English

791

85.9K

Vhiz@ThereBeLyte·14 Şub

@jsuarez @natolambert Any reading recommendations that I can go through to quickly get started with RL on LLMs.

English

Joseph Suarez 🐡@jsuarez·13 Şub

@natolambert RL is going to be sane and consistent this year, and it's going to be done open source in PufferLib

English

1.6K

Nathan Lambert@natolambert·13 Şub

New talk! I wanted to make space to ask: Where is this new wave of RL interest going? How does this compare to when we "rediscovered" RLHF post-ChatGPT with Alpaca etc? What ingredients make this different? How can RL and post-training become just "training"?

Interconnects@interconnectsai

An unexpected RL Renaissance New talk! Forecasting the Alpaca moment for reasoning models and why the new style of RL training is a far bigger deal than the emergence of RLHF. YouTube: youtube.com/watch?v=YXTYbr… Slides: #slide=id.p" target="_blank" rel="nofollow noopener">docs.google.com/presentation/d… More info: interconnects.ai/p/an-unexpecte…

English

329

52.9K

Vhiz@ThereBeLyte·9 Şub

@m_att_dunn @jenboland @pelaseyed @nlpnoah Thanks Matt! I read the paper. Wondering how would one handle scenarios where the query would require multiple documents? Won’t it be very hard to finetune for such scenarios because there could be lots of combinations

English

Matt Dunn@m_att_dunn·8 Şub

@ThereBeLyte @jenboland @pelaseyed We built it out of necessity, but Alan Li, et from @nlpnoah's group put together a nice paper that explores the approach. arxiv.org/pdf/2311.08593

English

111

homanp@pelaseyed·7 Şub

RAG died today

Deedy@deedydas

PDF parsing is pretty much solved at scale now. Gemini 2 Flash's $0.40/M tokens and 1M token context means you can now parse 6000 long PDFs at near perfect quality for $1

English

121

201

3.6K

869.5K

Vhiz@ThereBeLyte·8 Şub

@m_att_dunn @jenboland @pelaseyed I’m really keen to understand this. Can you please elaborate more

English

Matt Dunn@m_att_dunn·8 Şub

@jenboland @pelaseyed No. Let the LLM sample from the corpus directly. Caveat being if you work at super large scale of documents.

English

191

Vhiz@ThereBeLyte·27 Oca

@Mango I did it yesterday. please check

English

MANGO@Mango·26 Oca

@ThereBeLyte Hello. Please send us your full name, email address and order number via direct message so that we can assist you further. Many thanks.

English

Vhiz@ThereBeLyte·26 Oca

@Mango - I need a refund

English

Entdecken

@victormustar @togethercompute @i_am_brennan @gunta85 @samsja19 @QuanquanGu @athleticKoder @NirantK