Lev Neiman

17 posts

Lev Neiman

Lev Neiman

@NeimanLev

Katılım Mart 2021
325 Takip Edilen23 Takipçiler
Galileo
Galileo@rungalileo·
Building a crew of agents is the easy part. Knowing what they're doing and stopping them when they're off course is where most teams get stuck. We're co-hosting a live session tomorrow with @crewAIInc to cover exactly that. Join Galileo co-founder and CTO @YashSheth46 and CrewAI founder and CEO @joaomdmoura as they walk through how to govern multi-agent systems at scale, covering behavior, cost, and compliance. In this session, you'll learn: → How to enforce safety and security policies in CrewAI agents. → How to steer agents to the best models and fallback tools at runtime to improve accuracy and control token costs → How to govern all your agents, whether CrewAI, internal or third-party, with one centralized set of policies → How to include non-technical stakeholders (such as risk and compliance) in writing or maintaining policies – no coding required Last chance to register here: galileo.ai/webinar/govern…
Galileo tweet media
English
1
0
1
90
Lev Neiman
Lev Neiman@NeimanLev·
Lfg!!!
Galileo@rungalileo

🚀 Big News: Galileo is joining forces with @Cisco! 🚀 We are thrilled to announce a massive milestone: Cisco has announced its intent to acquire Galileo! Five years ago, we started Galileo with a simple but bold mission: to solve the “trust problem” for software built with language models (aka NLP). We saw early on that these software workloads were fundamentally different—non-deterministic, unpredictable, and requiring a completely new approach to observability. Today, language model powered AI software is increasingly ubiquitous, the "trust gap" is the biggest bottleneck to unleash AI at scale and Galileo’s platform has been rapidly adopted by some of the world’s largest enterprises to ship trustworthy AI products. @splunk and Cisco more broadly have been pioneers in the observability and security space for decades. In becoming part of Cisco, we are excited and prepared to redefine how the world builds, deploys, and trusts AI at scale. The opportunity ahead of us is massive, and we are only getting started. What does this mean for our customers? The most important thing to know is that our commitment to you remains unchanged. You will still be working with the same reliable Galileo team you know and trust. However, we are now turbocharged with the "superpowers" of Cisco and Splunk! ⚡️ We are incredibly grateful to our team, our partners, and—most importantly—our users. We are always here for you, and we couldn’t be more excited about this next chapter. Onward! 🚀✨ @vikramchatterji, Atin, and @YashSheth46 Learn more here: blogs.cisco.com/news/Cisco-ann…

QST
0
0
0
13
Lev Neiman retweetledi
Galileo
Galileo@rungalileo·
Tomorrow, we're Taming the Claw 🦞 Join our engineer, @NeimanLev, at 10 am PST tomorrow as he walks you through how to use the Agent Control OpenClaw plugin to close the governance gaps that prompt-based safety can't cover. 🎟️ Register here: galileo.ai/webinar/taming…
Galileo@rungalileo

🦞 OpenClaw is one of the most capable agent frameworks available. It's also one of the easiest to lose control of. We're running a hands-on workshop to close that gap because prompt-based safety can't survive at scale. Join us for Taming The Claw, a hands-on workshop where our engineer, @NeimanLev, shows you how to layer Agent Control on top of OpenClaw to close the governance gaps that prompt-based safety can't cover. You'll learn: → How to install the Agent Control OpenClaw plugin → How to set up centralized governance for tool calling → Policy patterns for common failure modes: unconstrained tool access, permission escalation, uncontrolled sub-agents, and memory leakage You'll leave with: → A working Agent Control + OpenClaw integration you can adapt for your stack → A centralized control plane your entire team can update in minutes This is for engineers building with or evaluating OpenClaw who want production-grade governance. 🎟️ Register here: galileo.ai/webinar/taming…

English
0
1
2
196
Lev Neiman
Lev Neiman@NeimanLev·
I built the same caching library twice. Once at @DoorDash in Kotlin, now at @rungalileo in Python. Both times, the problems were identical. Here's what I learned 🧵
English
4
2
7
124
Lev Neiman
Lev Neiman@NeimanLev·
@DoorDash @rungalileo At @rungalileo it cut p50 latency by 50% and CPU by 40% on our highest-traffic endpoint. We validated by ramping back to 0% and watching latency return to baseline. Then left it on.
English
0
0
1
39
Lev Neiman
Lev Neiman@NeimanLev·
@DoorDash @rungalileo So I built GCache: open-source Python library that wraps Redis + cachetools with guardrails: → Caching off by default (opt-in only) → Gradual rollout (ramp 0→100%, kill switch via config) → One call invalidates all caches for a user/entity → Built-in Prometheus metrics
English
0
0
1
41
Lev Neiman
Lev Neiman@NeimanLev·
@DoorDash @rungalileo Every team adds caching the same way: one engineer decorates a function, picks a random key pattern, and ships it. Six months later you've got 6 different key patterns, no hit rate visibility, and invalidation logic scattered everywhere.
English
0
0
1
33
Sawyer Merritt
Sawyer Merritt@SawyerMerritt·
NEWS: Tesla owners who previously purchased Enhanced Autopilot can now subscribe to FSD (Supervised) for $49/month, reduced from the previous $99/month.
Sawyer Merritt tweet media
English
367
290
4.2K
556.5K
Lev Neiman
Lev Neiman@NeimanLev·
@romitjain_ @rungalileo This is a great point! For our cases, Luna metrics return fixed length responses. In cases with high output length variability we would need a way to estimate that upfront and factor into the score function.
English
1
0
1
34
r0
r0@romitjain_·
@rungalileo nice work, quick question, the load score is calculated based on input length only. Why is that hypothesis valid? Server can take a long time (high number of output tokens) to respond to a small query, too. Or is it always just a small output, irrespective of the input?
English
1
0
0
19
Galileo
Galileo@rungalileo·
For runtime agent observability, we had to solve the GenAI inference problem of poor GPU utilization and unpredictable latency. Here's how our engineer Lev Neiman did it 👇 Lev's solution: client-side load balancing backed by Redis & Lua scripting: – Clients compute a load score for each inference request (based on payload size) – Redis maintains a real-time view of GPU fleet load using sorted sets – Lua scripts ensure atomic operations, pick the least busy GPU and increment its score simultaneously – A background reconciler handles failures and keeps scores accurate The results: 📈 ~40% increase in average GPU utilization 📉 70% reduction in tail latency ☑️ Same infrastructure, zero additional servers For our customers who run millions of agent logs per month and depend on Galileo for runtime intervention at scale, this has been critical. Read more about it in Lev’s blog below 👇
Galileo tweet media
English
3
0
1
128
The Prodigy
The Prodigy@the_prodigy·
Due to illness within the band, tonight’s show at The Warfield is cancelled. If you got tickets via AXS online or by phone, a refund will automatically be issued to the card you used to purchase within 30 business days. Otherwise, refunds are available at point of purchase.
The Prodigy tweet media
English
26
19
182
24K
Lilian Weng
Lilian Weng@lilianweng·
After working at OpenAI for almost 7 years, I decide to leave. I learned so much and now I'm ready for a reset and something new. Here is the note I just shared with the team. 🩵
Lilian Weng tweet media
English
264
335
6.3K
971K