RELAI (@ReliableAI) - Twitterプロフィール

固定されたツイート

RELAI@ReliableAI·27 Eki

🚀 RELAI is live — a platform for building reliable AI agents 🔁 We complete the learning loop for agents: simulate → evaluate → optimize - Simulate with LLM personas, mocked MCP servers/tools and grounded synthetic data - Evaluate with code + LLM evaluators; turn human reviews into optimization-ready benchmarks - Optimize with Maestro; tune prompts, configs and even agent graph for improved quality, cost and latency Works with OpenAI Agents SDK, Google ADK, LangGraph, and all other agent frameworks 🌐 Get started (free): relai.ai ⭐ Open-source SDK: github.com/relai-ai/relai…

English

10

27

56

11.4K

RELAI@ReliableAI·19 Kas

A notebook showing how to optimize AI agents using the user feedback:

Soheil Feizi@FeiziSoheil

How to use expert feedback to optimize AI agents? In many real-world applications, there is no clear ground truth label for what a “good” agent response is. Often, all we have is user feedback and preferences (“this is wrong”, “missing context”, “too verbose”, etc.). This feedback is an extremely valuable supervision signal, but turning it into effective optimization of agent behavior is not straightforward: Stochasticity & replay To learn from feedback, we often need to “replay” the original sample or trace. But agentic systems (with tools, RAG, branching, etc.) are stochastic, so re-running the same input may not reproduce the same trajectory or output. Linking feedback to replays Even if we can approximate the original run, evaluating a new or re-played trace against the old feedback is non-trivial. The feedback is textual, often high-level and contextual, not a simple scalar reward. Optimizing config and structure Finally, we want to optimize both the agent configuration (prompts, hyperparameters, tools, thresholds) and the agent graph/structure (which nodes, in what order, with what routing). Jointly optimizing these under noisy, text-based feedback is a challenging learning and search problem. In this notebook, using an agentic RAG example, we show how to operationalize this: 📝 Convert user feedback on agentic runs into an annotation benchmark on RELAI 🎯 Use the Maestro agent optimizer to consume that benchmark and automatically improve both the config and the graph of the agent 🔁 Close the loop from user preference → benchmark → optimization → better agent in a reproducible, data-driven way 🔗 Notebook: colab.research.google.com/drive/1QtWbiGH… Powered by @ReliableAI (relai.ai)

English

0

1

167

RELAI@ReliableAI·17 Kas

One notebook to: "build -> simulate -> evaluate -> optimize" your agentic RAG! 🔗 Notebook: colab.research.google.com/drive/1N9l0PhO…

Soheil Feizi@FeiziSoheil

🚀 Sharing a Colab notebook with a complete learning loop for reliable agentic RAG: 🧱 Build agentic RAG on top of your own data 🎭 Simulate with persona-based runs to stress-test your agent 🧑‍⚖️ Evaluate quality automatically with Critico (LLM-as-a-judge) 🎛️ Optimize configs & structure with Maestro for better performance All in a single notebook. Works with any model; just drop in your API key. 🔗 Notebook: colab.research.google.com/drive/1N9l0PhO… Powered by RELAI (relai.ai)

English

0

1

2

233

RELAI がリツイート

Soheil Feizi@FeiziSoheil·5 Kas

🚀 Build AI agents that actually work — in just 2 hours! We’re launching Reliable AI Agent Sprints—free, fully virtual sessions to build practical, reliable agentic solutions. This isn’t a flashy demo contest; we’ll design, simulate, evaluate, and optimize real agents, addressing the nuances and challenges end to end. The sprints are open to everyone: students, builders, data folks, PMs, founders. No ML experience required (basic Python helps). We’ll share templates and all materials before each session. For the first sprint, we’ll focus on an agentic RAG assistant—an AI agent that uses your documents and data to complete tasks. By the end, you’ll have a reliable agent, demos, and performance metrics. We appreciate @googlecloud support with credits for Gemini and Cloud Run. RELAI @ReliableAI will also provide credits to simulate, evaluate, and optimize your agents. Dates: Nov 11, 12, and 15 (identical sessions) 👉 Pick a session and register now — limited seats Link: lu.ma/relai

English

2

4

12

3.1K

RELAI@ReliableAI·31 Eki

🎃 Here’s a sweet Halloween treat from RELAI: We built an AI agent that maps the best trick-or-treat route for you—optimized for time, distance, candy variety, and real walking paths. 👉 Try it free: platform.relai.ai/halloween Built at RELAI.ai, where we ship reliable agents (no stalls, no spooky bugs). 👻

English

0

1

6

258

RELAI@ReliableAI·31 Eki

@elonmusk @nikitabier @Premium Our CEO’s account @FeiziSoheil has been locked since Oct 20 in an “unusual login → change password” loop. We’ve filed multiple tickets, but the phone-verify form says “Duplicate case.” Looks like a bug in your system; please take a look (screenshot attached)

English

0

62

RELAI@ReliableAI·29 Eki

@Chaokoh2 @Peng1M @TensorSlay it does. Maestro optimizes both configs (like GEPA etc) and more (i.e. agent graph). Take a look at the example here: github.com/relai-ai/relai…

English

0

129

chaokoh@Chaokoh2·29 Eki

@ReliableAI @Peng1M @TensorSlay Maestro [...] suggest graph-level changes > does that mean it optimises the set but not each component the way DSPy/GEPA does?

English

1

0

85

Tensor-Slayer@TensorSlay·10 Eyl

I actively try to add DSPy + GEPA into every thing I am working these days. It has been such a joy. My only gripe is it’s going viral and I hate it because somehow feel like everyone is getting a super power which was exclusive to “nerds”

Omar Khattab@lateinteraction

originally a Ruby concept, at least for me

English

8

5

69

7.1K

RELAI@ReliableAI·29 Eki

@Peng1M @Chaokoh2 @TensorSlay Maestro is live! Plz check it out!

English

1

0

75

YM@Peng1M·10 Eyl

@Chaokoh2 @TensorSlay @ReliableAI Will check it out!

English

1

0

1

19

RELAI@ReliableAI·29 Eki

@TensorSlay Plz give our agent optimizer, Maestro, a try and let us know how we do: x.com/ReliableAI/sta…

RELAI@ReliableAI

🚀 RELAI is live — a platform for building reliable AI agents 🔁 We complete the learning loop for agents: simulate → evaluate → optimize - Simulate with LLM personas, mocked MCP servers/tools and grounded synthetic data - Evaluate with code + LLM evaluators; turn human reviews into optimization-ready benchmarks - Optimize with Maestro; tune prompts, configs and even agent graph for improved quality, cost and latency Works with OpenAI Agents SDK, Google ADK, LangGraph, and all other agent frameworks 🌐 Get started (free): relai.ai ⭐ Open-source SDK: github.com/relai-ai/relai…

English

0

61

RELAI@ReliableAI·28 Eki

@jbhuang0604 Thank you!

English

0

79

Jia-Bin Huang@jbhuang0604·28 Eki

@ReliableAI Super cool! Congrats on the launch!

English

1

0

2

823

RELAI@ReliableAI·27 Eki

🚀 RELAI is live — a platform for building reliable AI agents 🔁 We complete the learning loop for agents: simulate → evaluate → optimize - Simulate with LLM personas, mocked MCP servers/tools and grounded synthetic data - Evaluate with code + LLM evaluators; turn human reviews into optimization-ready benchmarks - Optimize with Maestro; tune prompts, configs and even agent graph for improved quality, cost and latency Works with OpenAI Agents SDK, Google ADK, LangGraph, and all other agent frameworks 🌐 Get started (free): relai.ai ⭐ Open-source SDK: github.com/relai-ai/relai…

English

10

27

56

11.4K

RELAI@ReliableAI·28 Eki

@faryad_ds Yes the sdk can be used fully locally. The (free) subscription is for additional features like providing user reviews and annotating samples on the platform, creating benchmarks, etc. Plz give it a try and let us know how we do!

English

1

0

1

90

Faryad Sahneh@faryad_ds·28 Eki

@ReliableAI Can the sdk be used fully locally (beside llm calls) or it requires a remote service (and thus relai subscription)?

English

1

0

2

103

RELAI@ReliableAI·28 Eki

@YogeshBalaji95 Thank you!

English

0

62

Yogesh@YogeshBalaji95·28 Eki

@ReliableAI Congratulations on this launch. Very impressive!

English

1

0

2

86

RELAI@ReliableAI·27 Eki

@FeiziSoheil x.com/ReliableAI/sta…

RELAI@ReliableAI

🚀 RELAI is live — a platform for building reliable AI agents 🔁 We complete the learning loop for agents: simulate → evaluate → optimize - Simulate with LLM personas, mocked MCP servers/tools and grounded synthetic data - Evaluate with code + LLM evaluators; turn human reviews into optimization-ready benchmarks - Optimize with Maestro; tune prompts, configs and even agent graph for improved quality, cost and latency Works with OpenAI Agents SDK, Google ADK, LangGraph, and all other agent frameworks 🌐 Get started (free): relai.ai ⭐ Open-source SDK: github.com/relai-ai/relai…

QME

0

32

RELAI がリツイート

Soheil Feizi@FeiziSoheil·8 Eyl

Introducing Maestro: the holistic optimizer for AI agents. Maestro optimizes the agent graph and tunes prompts/models/tools, fixing agent failure modes that prompt-only or RL weight tuning can’t touch. Maestro outperforms leading prompt optimizers (e.g., MIPROv2, GEPA) on multiple benchmarks (IFBench, HotpotQA) with far fewer rollouts. 📄 Technical report: arxiv.org/pdf/2509.04642 🎯 Early access for select builders: comment under this thread or ping us at relai.ai

English

18

54

329

39.8K

RELAI@ReliableAI·27 Eki

@chengez1114 Thank you, and yes plz do!

English

0

114

Yize Cheng@chengez1114·27 Eki

@ReliableAI Looks awesome! Can’t wait to try RELAI in my own research and projects!

English

1

0

3

136

RELAI@ReliableAI·27 Eki

@b_shrir Thank you! Let us know how we do in improving your agents!

English

0

155

Sriram B@b_shrir·27 Eki

@ReliableAI Glad to see that this is out for everyone to use! Looking forward to use this with my projects

English

1

0

3

182

RELAI がリツイート

Soheil Feizi@FeiziSoheil·10 Eyl

let’s talk instruction-following: In prod, “did it follow the spec?” matters more than vibes. IFBench is a challenging benchmark to check whether agents/models obey unseen output/format constraints (length windows, HTML/Markdown rules, sectioning, etc.). That’s a real reliability target: if your output violates constraints, downstream automation breaks.

Soheil Feizi@FeiziSoheil

Introducing Maestro: the holistic optimizer for AI agents. Maestro optimizes the agent graph and tunes prompts/models/tools, fixing agent failure modes that prompt-only or RL weight tuning can’t touch. Maestro outperforms leading prompt optimizers (e.g., MIPROv2, GEPA) on multiple benchmarks (IFBench, HotpotQA) with far fewer rollouts. 📄 Technical report: arxiv.org/pdf/2509.04642 🎯 Early access for select builders: comment under this thread or ping us at relai.ai

English

2

3

13

4.2K

RELAI@ReliableAI·8 Eyl

Prompt Tuning ≠ System Tuning. Most AI agent failures are structural; we keep the agent graph frozen (modules & info flow), then wonder why agents hallucinate, misroute tools, or break guidelines. Meet Maestro: the first joint graph + config optimizer for AI agents. It optimizes the agent structure and tunes prompts/models/tools, fixing failure modes that prompt-only or RL weight tuning can’t touch. Maestro outperforms leading prompt optimizers (e.g., MIPROv2, GEPA) on multiple benchmarks (IFBench, HotpotQA) with far fewer rollouts. 📄 Technical report: arxiv.org/pdf/2509.04642 🎯 Early access for select builders: comment under this thread or ping us at relai.ai

Soheil Feizi@FeiziSoheil

Introducing Maestro: the holistic optimizer for AI agents. Maestro optimizes the agent graph and tunes prompts/models/tools, fixing agent failure modes that prompt-only or RL weight tuning can’t touch. Maestro outperforms leading prompt optimizers (e.g., MIPROv2, GEPA) on multiple benchmarks (IFBench, HotpotQA) with far fewer rollouts. 📄 Technical report: arxiv.org/pdf/2509.04642 🎯 Early access for select builders: comment under this thread or ping us at relai.ai

English

0

3

6

2.4K

RELAI がリツイート

Soheil Feizi@FeiziSoheil·8 Ağu

How good (or bad) is GPT-5 — and does it matter for you? I’ve been seeing a lot of posts lately debating the quality of GPT-5’s responses. I tried a few of the examples people mentioned. Here’s one from my own experiment (screenshot attached): I asked GPT-5 to solve a simple arithmetic problem. It got it wrong. Then, when I allowed it to use a calculator tool, it got it right. So what does this tell us? I think many people make a mistake in framing LLMs as some kind of “AGI in a box.” That’s not how they’re meant to work and honestly, it’s not helpful to think of them that way. LLMs aren’t replacements for the thousands of sophisticated tools we already rely on in engineering, science, or other technical fields. They are language models: brilliant at pattern recognition, reasoning with context, and increasingly good at knowing when they don’t know something… and then using the right “tool” to get it right. That’s why I believe the real future is AI agents as systems where the LLM is a crucial component, but the system’s strength comes from orchestrating other agents, models, and tools via APIs, MCPs, or even a bit of Python. My favorite analogy: LLMs are like bricks in a building. You can have bricks of different shapes, sizes, and strengths but to make an amazing building, you still need tools, engineering, and expertise to assemble them into something greater. The magic isn’t just in the brick. It’s in the architecture.