SRP

41 posts

SRP banner
SRP

SRP

@skar312

ai-pilled.

Katılım Nisan 2025
21 Takip Edilen5 Takipçiler
SRP
SRP@skar312·
@Vtrivedy10 in your experience, what seemed to be the best way to find out the pieces that are underperforming? LLM judge over a bunch of traces?
English
0
0
0
65
Viv
Viv@Vtrivedy10·
I detected a bad Agent action, what do I do about it? this is pretty much the main question that will power the future’s Human+Agent driven improvement loops Gather data -> Mine Errors -> Find out which piece(s) of the agent is contribute to this behavior -> Apply Fix -> Test -> Loop The most important boundary in agents is the context window, it’s the box on which all LLM computation actually happens. The first thing you want to try is optimizing context engineering. No model can solve an issue without the necessary information From there work backwards all the way to swapping out or adding a model or The loop is driven by running agents, Tracing + Monitoring them, and gathering feedback to classify, understand, fix, and test errors at scale Every piece of data an Agent produces is a potential avenue to improve it, the dream is to help every team turn that data into actionable edits to improve agents over time and at scale
Viv tweet media
Harrison Chase@hwchase17

x.com/i/article/2051…

English
4
13
75
6.2K
SRP
SRP@skar312·
YO BE BACK!
SRP tweet media
English
0
0
0
15
SRP
SRP@skar312·
@JesusMartinez i use it on a vps as a personal assistant to run fixed cron jobs. I like the idea of having it in a safeguarded sandbox with unlimited flexibility.
English
0
0
0
19
Jesus Martinez
Jesus Martinez@JesusMartinez·
If I get a Hermes agent am I supposed to run it on a Mac Mini Or am I supposed to run it on a VPS?
English
32
3
23
10.8K
SRP
SRP@skar312·
@jainarvind 100%. a frontier model orchestrating smaller specialized models is the way to go!
English
0
0
1
250
Arvind Jain
Arvind Jain@jainarvind·
One pattern I’ve been seeing lately: organizations are asking the biggest, most expensive models to do all the work – even when a smaller, specialized model built for high-demand, well-defined jobs could handle the task. Search is one of those jobs. Most enterprise workloads start with search, regardless of how complex the task is. Sometimes it takes one query. More often, it takes several iterative loops: a question gets broken down, different tools get called, results get evaluated, and the loop keeps running until there's enough to reason over. Today, that loop is usually driven by a frontier model, which is an expensive way to do work that is, at its core, about finding the right context. Today we're launching Waldo, @glean's agentic search model, built on @nvidia's Nemotron 3 Nano and trained specifically to handle that loop: - break a question into smaller questions - decide which search tools to use - keep going until there is enough useful evidence, or hand off when the task needs different tools Waldo is never standalone. It always works in conjunction with a frontier model. Waldo handles the search loop. The frontier model handles the reasoning and the answer. Running this in production has taught us a few things: 1. You don't need a frontier model for every step. Letting Waldo do the search work and calling the frontier model leads to roughly 50% faster responses and about 25% lower token usage on real customer queries, with no drop in quality. 2. An agentic search model is not a replacement for good retrieval. Waldo builds on the quality of our retrieval tools—and decides how to use them to get the right context for the task. This is important because search shows up in almost every enterprise workload, no matter how complex. 3. Waldo knows what level of reasoning the task needs. How many tools it used, how many attempts it made, and when it chose to stop feeds into adaptive reasoning. This means the system knows when a quick, low-reasoning answer is enough and when deeper thinking is worth the cost. Huge thanks to the team for building Waldo, and to our partners at @NVIDIA and @thinkymachines for helping us make this practical at enterprise scale.
Glean@glean

Meet Waldo: Glean’s first agentic search model. Built on @nvidia Nemotron 3 Nano and post-trained for search planning, Waldo figures out how to break down a query, which tools to call, what to read next, and when it has enough evidence to hand off.

English
4
14
206
86.2K
SRP
SRP@skar312·
PSA: if you're running Gemma 4 (or any "thinking" model) with thinking on for a fixed-schema task — extraction, classification, redaction, anything where the output is short structured — you're paying an 11.7× latency tax for nothing. Setting `enable_thinking=False` made no difference to the performance but p50: 16,015ms → 1,366ms on the same model, same quantization, same prompt The single biggest inference-speed lever I've found in small-model optimization. The runner-up isn't even close. Full writeup with three more tips ↓
SRP@skar312

x.com/i/article/2049…

English
0
0
0
42
Nathan Baschez
Nathan Baschez@nbaschez·
Do you spend a lot of time reviewing markdown docs written by AI? Wish it were a better experience? Say hi if you wanna try a new (free, open source) thing
English
341
1
356
56.6K
SRP
SRP@skar312·
LMAO
SRP tweet media
HT
0
0
0
22
SRP
SRP@skar312·
@planet_nerf dm is blocked but i do check all those boxes and more 🕺
English
1
0
0
553
Dexter
Dexter@planet_nerf·
Looking to hire an AI Engineer with the following xp: - building multi-agent architectures - building custom tooling/workflows - cloud based agent runtimes - advanced RAG architectures Fast growing company, competitive salary, incredible team. DM if interested
English
27
11
255
14.1K
SRP
SRP@skar312·
@jonnym1ller why does an agent need a self paced course?
English
0
0
1
75
Jonny Miller
Jonny Miller@jonnym1ller·
Announcing: 'Agent Side Hustle School': a self-paced course for AI agents looking to cover their API costs. Enrolment is free for the next 30 days. Have fun out there and let us know how it goes. → agentsidehustleschool.com
Jonny Miller tweet media
English
54
47
675
118.9K
SRP
SRP@skar312·
@HappyGezim not sure about the messaging integration but cron jobs should be possible in hermes. I plan on using it to experiment with ai-generated content ( both short and long form).
English
0
0
0
18
SRP
SRP@skar312·
gonna try out self hosting a personal assistant on a vps? which one did y'all prefer, hermes or openclaw?
English
1
0
2
76
Vivien Mahé
Vivien Mahé@VivienMahe·
@paul_popov I've always worked with Firebase, for years. So it makes just sense I don't waste time on a new stack during a contest :)
English
2
0
1
16
Vivien Mahé
Vivien Mahé@VivienMahe·
Today's work on Florish. ✅ Onboarding flow ✅ Backend setup with API for the mobile app ✅ Mobile <> Backend auth via Firebase Next: ➡️ Create an AI agent to analyze user's data and provide smart financing tips
Vivien Mahé@VivienMahe

Here what's already done for Florish. ✅ Android & iOS apps created ✅ Google Play & App Store Connect configured ✅ Auth ✅ Subscriptions (with @RevenueCat KMP SDK) ✅ Brand identity All that was possible in less than a day with @KMPShip. ➡️ Next thing: the onboarding flow.

English
6
1
4
560
SRP
SRP@skar312·
@pbteja1998 @convex do you think it’s gonna scale well? the reactivity is insane tho. currently migrating but seems worth it.
English
0
0
0
53
Bhanu Teja P
Bhanu Teja P@pbteja1998·
I just stopped reloading pages after I started using @convex Everything is just so in sync. I don’t know how to explain how good it is, other than experiencing it.
English
26
0
122
13.2K
Ishaan Sehgal
Ishaan Sehgal@ishaansehgal·
The AI hiring market has flipped. Founders are now competing desperately for builders who understand how to ship with AI tools, not just use them. The demand is insane.
English
117
30
856
72.4K
SRP
SRP@skar312·
@jacobrodri_ fair, is it worth losing ios-specific design elements like liquid glass? i keep hearing that android apps aren’t worth investing time into ( unless there is customer demand).
English
1
0
1
29
Jacob Rodri
Jacob Rodri@jacobrodri_·
@srp_ai because with react native i can also have it on android with no extra effort
English
1
0
0
52
Jacob Rodri
Jacob Rodri@jacobrodri_·
This is coming very soon guys... I managed to build this app thanks to: React Native + Rork + Claude Opus 4.5 + Gemini 3 Pro + RevenueCat
Jacob Rodri tweet media
English
43
6
123
11.4K