Sabitlenmiş Tweet
EvalOps
47 posts

EvalOps
@EvalOpsDev
The control plane for running and governing agents in production.
San Francisco, CA Katılım Temmuz 2025
1 Takip Edilen309 Takipçiler

EvalOps is where evaluations meet operations — and security is no exception.
“keep” shows how device posture, SSO, and OPA policies can be continuously tested and traced like any other system.
Run it, break it, measure it.
github.com/evalops/keep
English

Agents are already writing your code. The question isn't "should we use them?" It's "how do we ship them without surprises?"
Provenance gives you a ledger. Every line. Every agent. Every risk. Measurable.
github.com/evalops/proven…
English

We’re open-sourcing Smith — the Firecracker-based CI runner that powers EvalOps.
Why rebuild Blacksmith?
Because eval gating needs specialized infra — and we’re not forcing you onto our cloud.
Run evals on EvalOps Cloud or your own. github.com/evalops/smith
English

Everyone wants to move fast.
@EvalOpsDev makes sure you don’t break trust along the way.
Governed AI releases start here.
Jonathan Haas@JonathanHaas
Shipped a new home for @EvalOpsDev. No fluff, just governed AI releases. Check it out -> evalops.dev
English

🔥 Just dropped an evaluation‑driven LoRA loop built on Tinker from @thinkymachines! It trains, benchmarks & iterates until your model meets the mark. It auto‑spots weaknesses, spawns targeted LoRA jobs & tracks improvements.
Proof‑of‑concept repo:
github.com/evalops/tinker…
English

Sick of yak-shaving to get a clean Transformers setup?
We built a stack that just works:
PyTorch + HF Transformers
Hydra configs
FastAPI serving + Prometheus
vLLM, LoRA, flash-attn, bitsandbytes
Reproducible. Dockerized. CI/CD baked in.
github.com/evalops/stack
English

Developer resumes are frozen in time. GitHub tells the real story.
7k commits, +1.4M lines → now that’s a holographic trading card worth flexing. 🚀
cards.evalops.dev
English
EvalOps retweetledi

🐙 Meet Mocktopus — multi‑armed mocks for your LLM apps!
🧪 Deterministic mocks for OpenAI‑style chat completions, tool calls & streaming.
Make your evals deterministic, run CI offline, and record & replay
👉 github.com/evalops/mockto…
English






