
Prompter
2.7K posts

Prompter
@prompter_ai
building automation projects and sharing them with my followers


We built Kairos: AI that learns by watching you work once, then automates it forever. No code. No drag and drop. It's like training a co-worker. Early access is limited. DM us or visit our website.


Tome (the company) is becoming Lightfield. We’re excited to share more with you soon. Stay tuned for updates: lightfield.app


Seems crazy that we still don’t have autonomous charging… everything else is autonomous but a human plugs in the charger… feels solvable

My reaction is that there is an evaluation crisis. I don't really know what metrics to look at right now. MMLU was a good and useful for a few years but that's long over. SWE-Bench Verified (real, practical, verified problems) I really like and is great but itself too narrow. Chatbot Arena received so much focus (partly my fault?) that LLM labs have started to really overfit to it, via a combination of prompt mining (from API requests), private evals bombardment, and, worse, explicit use of rankings as training supervision. I think it's still ~ok and there's a lack of "better", but it feels on decline in signal. There's a number of private evals popping up, an ensemble of which might be one promising path forward. In absence of great comprehensive evals I tried to turn to vibe checks instead, but I now fear they are misleading and there is too much opportunity for confirmation bias, too low sample size, etc., it's just not great. TLDR my reaction is I don't really know how good these models are right now.


Introducing Autodelve 🤖⚡️ YC-backed AI customer support tools quoted us nearly $20,000/yr for a simple Discord Q/A bot. So built our own with @cursor_ai in 2 hours and made it open source. Fork the repo, index your docs site, and get answers in Discord in under 5 minutes




