EvalOps

47 posts

EvalOps banner
EvalOps

EvalOps

@EvalOpsDev

The control plane for running and governing agents in production.

San Francisco, CA Katılım Temmuz 2025
1 Takip Edilen309 Takipçiler
Sabitlenmiş Tweet
EvalOps
EvalOps@EvalOpsDev·
Got tired of customers asking 'how do I know your eval results are real?' Fair question. So we made them mathematically provable.
EvalOps tweet media
English
0
0
0
811
EvalOps
EvalOps@EvalOpsDev·
everyone's like "how big is your team" brother. it's one agent. it's opening PRs against itself. i haven't written code in four months. leave me alone
English
0
1
3
248
EvalOps
EvalOps@EvalOpsDev·
the best AI coding assistant might be the one that works on a plane
EvalOps tweet media
English
0
0
0
267
EvalOps
EvalOps@EvalOpsDev·
Every release is a high‑wire act. Instead of praying for calm winds, build a net. EvalOps ties your policies, metrics and audits into a mesh that lets you scale without falling.
EvalOps tweet media
English
0
0
1
658
EvalOps
EvalOps@EvalOpsDev·
We open-sourced Nimbus – Firecracker-based CI for AI workloads. Multi-tenant isolation, RBAC, audit logs.
EvalOps tweet media
English
0
0
1
643
EvalOps
EvalOps@EvalOpsDev·
EvalOps is where evaluations meet operations — and security is no exception. “keep” shows how device posture, SSO, and OPA policies can be continuously tested and traced like any other system. Run it, break it, measure it. github.com/evalops/keep
English
0
0
0
111
EvalOps
EvalOps@EvalOpsDev·
Agents are already writing your code. The question isn't "should we use them?" It's "how do we ship them without surprises?" Provenance gives you a ledger. Every line. Every agent. Every risk. Measurable. github.com/evalops/proven…
English
1
1
1
536
EvalOps
EvalOps@EvalOpsDev·
We’re open-sourcing Smith — the Firecracker-based CI runner that powers EvalOps. Why rebuild Blacksmith? Because eval gating needs specialized infra — and we’re not forcing you onto our cloud. Run evals on EvalOps Cloud or your own. github.com/evalops/smith
English
1
0
1
380
EvalOps
EvalOps@EvalOpsDev·
I'm told we're doing awards now?
EvalOps tweet media
English
0
0
0
923
EvalOps
EvalOps@EvalOpsDev·
🔥 Just dropped an evaluation‑driven LoRA loop built on Tinker from @thinkymachines! It trains, benchmarks & iterates until your model meets the mark. It auto‑spots weaknesses, spawns targeted LoRA jobs & tracks improvements. Proof‑of‑concept repo: github.com/evalops/tinker…
English
0
1
3
581
EvalOps
EvalOps@EvalOpsDev·
Sick of yak-shaving to get a clean Transformers setup? We built a stack that just works: PyTorch + HF Transformers Hydra configs FastAPI serving + Prometheus vLLM, LoRA, flash-attn, bitsandbytes Reproducible. Dockerized. CI/CD baked in. github.com/evalops/stack
English
0
1
3
404
EvalOps
EvalOps@EvalOpsDev·
Developer resumes are frozen in time. GitHub tells the real story. 7k commits, +1.4M lines → now that’s a holographic trading card worth flexing. 🚀 cards.evalops.dev
English
0
0
1
56
EvalOps retweetledi
Jonathan Haas
Jonathan Haas@JonathanHaas·
LLM vendor: “Just quantization.” Reality: reward-hacked code, broken workflows, lost week. Companies: “nbd.” Users: 🙃🔥 Making this a thing of the past.
Jonathan Haas tweet media
English
0
1
2
423
EvalOps
EvalOps@EvalOpsDev·
Interested? DM for early access.
English
0
0
0
180
EvalOps
EvalOps@EvalOpsDev·
This transforms AI codegen from a toy that produces drafts into a partner you can trust to do real work.
English
1
0
1
196
EvalOps
EvalOps@EvalOpsDev·
All of us have been dazzled by large language models’ ability to spit out code, fix bugs, or draft boilerplate. But when you put that code into production, every hidden bug is a potential outage, compliance fine, or security hole. And today’s AI tools leave you guessing.
English
1
0
3
476
EvalOps
EvalOps@EvalOpsDev·
🐙 Meet Mocktopus — multi‑armed mocks for your LLM apps! 🧪 Deterministic mocks for OpenAI‑style chat completions, tool calls & streaming. Make your evals deterministic, run CI offline, and record & replay 👉 github.com/evalops/mockto…
English
0
1
5
1K