SRP

41 posts

SRP

@skar312

ai-pilled.

Katılım Nisan 2025

21 Takip Edilen5 Takipçiler

SRP@skar312·6d

@Vtrivedy10 in your experience, what seemed to be the best way to find out the pieces that are underperforming? LLM judge over a bunch of traces?

English

Viv@Vtrivedy10·6d

I detected a bad Agent action, what do I do about it? this is pretty much the main question that will power the future’s Human+Agent driven improvement loops Gather data -> Mine Errors -> Find out which piece(s) of the agent is contribute to this behavior -> Apply Fix -> Test -> Loop The most important boundary in agents is the context window, it’s the box on which all LLM computation actually happens. The first thing you want to try is optimizing context engineering. No model can solve an issue without the necessary information From there work backwards all the way to swapping out or adding a model or The loop is driven by running agents, Tracing + Monitoring them, and gathering feedback to classify, understand, fix, and test errors at scale Every piece of data an Agent produces is a potential avenue to improve it, the dream is to help every team turn that data into actionable edits to improve agents over time and at scale

Harrison Chase@hwchase17

x.com/i/article/2051…

English

6.2K

SRP@skar312·1 May

biggest unlock for me has been Hermes + GPT 5.5 hosted in a vps/tailscale. It runs 24/7 and I don’t have to worry about a thing.

BentoBoi@BentoBoiNFT

Day 35 of speed running AI and becoming an expert as a noob ✅ Migrated OpenClaw → Hermes ✅ Shipped a video content agent ✅ Worked on Real Estate startup Hermes Agent + GPT 5.5 is a big upgrade from my old OpenClaw setup. Let’s run it up again and learn more today gang 🫡

English

716

SRP@skar312·1 May

@Teknium @Tu7uruu @realsigridjin damn, is this the same web-ui mentioned in docs or something new?

English

Teknium 🪽@Teknium·30 Nis

@Tu7uruu @realsigridjin Will look like this when used:

English

1.1K

Sigrid Jin 🌈🙏@realsigridjin·30 Nis

we need hermes integration

Parth Jadhav@ParthJadhav8

This is crazy !! Cursor now has built a Kanban board where you can just drop in tasks and the agent will pick those up and complete them.

English

1.8K

SRP@skar312·30 Nis

I’d rather build real products or study system design than waste another hour on leetcode. Most devs would 10x their career by learning: • Clean architecture & tradeoffs • Testing & evals • Shipping fast & owning decisions …instead of grinding DP algos.

Abhilash Chowdhary@TheChowdhary

I just spoke with a few YC founders, and every single one is now thinking about asking engineers to share their Claude Code sessions before hiring I don’t think people realize how big the gap is between an engineer who knows how to use Claude efficiently vs. one who just prompts the same thing from scratch every time we had one prospect who used Claude Code + our Crustdata API to build a Clay alternative in under 24 hours, and it was actually good, there are just massive skill gaps here

English

SRP@skar312·30 Nis

YO BE BACK!

English

SRP@skar312·29 Nis

@JesusMartinez i use it on a vps as a personal assistant to run fixed cron jobs. I like the idea of having it in a safeguarded sandbox with unlimited flexibility.

English

Jesus Martinez@JesusMartinez·27 Nis

If I get a Hermes agent am I supposed to run it on a Mac Mini Or am I supposed to run it on a VPS?

English

10.8K

SRP@skar312·29 Nis

@jainarvind 100%. a frontier model orchestrating smaller specialized models is the way to go!

English

250

Arvind Jain@jainarvind·28 Nis

One pattern I’ve been seeing lately: organizations are asking the biggest, most expensive models to do all the work – even when a smaller, specialized model built for high-demand, well-defined jobs could handle the task. Search is one of those jobs. Most enterprise workloads start with search, regardless of how complex the task is. Sometimes it takes one query. More often, it takes several iterative loops: a question gets broken down, different tools get called, results get evaluated, and the loop keeps running until there's enough to reason over. Today, that loop is usually driven by a frontier model, which is an expensive way to do work that is, at its core, about finding the right context. Today we're launching Waldo, @glean's agentic search model, built on @nvidia's Nemotron 3 Nano and trained specifically to handle that loop: - break a question into smaller questions - decide which search tools to use - keep going until there is enough useful evidence, or hand off when the task needs different tools Waldo is never standalone. It always works in conjunction with a frontier model. Waldo handles the search loop. The frontier model handles the reasoning and the answer. Running this in production has taught us a few things: 1. You don't need a frontier model for every step. Letting Waldo do the search work and calling the frontier model leads to roughly 50% faster responses and about 25% lower token usage on real customer queries, with no drop in quality. 2. An agentic search model is not a replacement for good retrieval. Waldo builds on the quality of our retrieval tools—and decides how to use them to get the right context for the task. This is important because search shows up in almost every enterprise workload, no matter how complex. 3. Waldo knows what level of reasoning the task needs. How many tools it used, how many attempts it made, and when it chose to stop feeds into adaptive reasoning. This means the system knows when a quick, low-reasoning answer is enough and when deeper thinking is worth the cost. Huge thanks to the team for building Waldo, and to our partners at @NVIDIA and @thinkymachines for helping us make this practical at enterprise scale.

Glean@glean

Meet Waldo: Glean’s first agentic search model. Built on @nvidia Nemotron 3 Nano and post-trained for search planning, Waldo figures out how to break down a query, which tools to call, what to read next, and when it has enough evidence to hand off.

English

206

86.2K

SRP@skar312·29 Nis

PSA: if you're running Gemma 4 (or any "thinking" model) with thinking on for a fixed-schema task — extraction, classification, redaction, anything where the output is short structured — you're paying an 11.7× latency tax for nothing. Setting `enable_thinking=False` made no difference to the performance but p50: 16,015ms → 1,366ms on the same model, same quantization, same prompt The single biggest inference-speed lever I've found in small-model optimization. The runner-up isn't even close. Full writeup with three more tips ↓

SRP@skar312

x.com/i/article/2049…

English

SRP@skar312·29 Nis

x.com/i/article/2049…

ZXX

SRP@skar312·29 Nis

@nbaschez YES!

Nathan Baschez@nbaschez·28 Nis

Do you spend a lot of time reviewing markdown docs written by AI? Wish it were a better experience? Say hi if you wanna try a new (free, open source) thing

English

341

356

56.6K

SRP@skar312·28 Nis

LMAO

SRP@skar312·14 Nis

@planet_nerf dm is blocked but i do check all those boxes and more 🕺

English

553

Dexter@planet_nerf·13 Nis

Looking to hire an AI Engineer with the following xp: - building multi-agent architectures - building custom tooling/workflows - cloud based agent runtimes - advanced RAG architectures Fast growing company, competitive salary, incredible team. DM if interested

English

255

14.1K

SRP@skar312·30 Mar

@jonnym1ller why does an agent need a self paced course?

English

Jonny Miller@jonnym1ller·30 Mar

Announcing: 'Agent Side Hustle School': a self-paced course for AI agents looking to cover their API costs. Enrolment is free for the next 30 days. Have fun out there and let us know how it goes. → agentsidehustleschool.com

English

675

118.9K

SRP@skar312·14 Mar

@HappyGezim not sure about the messaging integration but cron jobs should be possible in hermes. I plan on using it to experiment with ai-generated content ( both short and long form).

English

SRP@skar312·14 Mar

gonna try out self hosting a personal assistant on a vps? which one did y'all prefer, hermes or openclaw?

English

SRP@skar312·24 Oca

@VivienMahe @paul_popov did you consider convex?

English

Vivien Mahé@VivienMahe·24 Oca

@paul_popov I've always worked with Firebase, for years. So it makes just sense I don't waste time on a new stack during a contest :)

English

Vivien Mahé@VivienMahe·23 Oca

Today's work on Florish. ✅ Onboarding flow ✅ Backend setup with API for the mobile app ✅ Mobile <> Backend auth via Firebase Next: ➡️ Create an AI agent to analyze user's data and provide smart financing tips

Vivien Mahé@VivienMahe

Here what's already done for Florish. ✅ Android & iOS apps created ✅ Google Play & App Store Connect configured ✅ Auth ✅ Subscriptions (with @RevenueCat KMP SDK) ✅ Brand identity All that was possible in less than a day with @KMPShip. ➡️ Next thing: the onboarding flow.

English

560

SRP@skar312·24 Oca

@pbteja1998 @convex do you think it’s gonna scale well? the reactivity is insane tho. currently migrating but seems worth it.

English

Bhanu Teja P@pbteja1998·23 Oca

I just stopped reloading pages after I started using @convex Everything is just so in sync. I don’t know how to explain how good it is, other than experiencing it.

English

122

13.2K

SRP@skar312·21 Oca

@ishaansehgal how would one showcase this?

English

Ishaan Sehgal@ishaansehgal·20 Oca

The AI hiring market has flipped. Founders are now competing desperately for builders who understand how to ship with AI tools, not just use them. The demand is insane.

English

117

856

72.4K

SRP@skar312·20 Oca

@jacobrodri_ fair, is it worth losing ios-specific design elements like liquid glass? i keep hearing that android apps aren’t worth investing time into ( unless there is customer demand).

English

Jacob Rodri@jacobrodri_·20 Oca

@srp_ai because with react native i can also have it on android with no extra effort

English

Jacob Rodri@jacobrodri_·19 Oca

This is coming very soon guys... I managed to build this app thanks to: React Native + Rork + Claude Opus 4.5 + Gemini 3 Pro + RevenueCat

English

123

11.4K

Keşfet

@Vtrivedy10 @Teknium @Tu7uruu @realsigridjin @JesusMartinez @jainarvind @glean @nvidia