EvalOps

47 posts

EvalOps

@EvalOpsDev

The control plane for running and governing agents in production.

San Francisco, CA Katılım Temmuz 2025

1 Takip Edilen309 Takipçiler

Sabitlenmiş Tweet

EvalOps@EvalOpsDev·3 Kas

Got tired of customers asking 'how do I know your eval results are real?' Fair question. So we made them mathematically provable.

English

811

EvalOps@EvalOpsDev·19 May

everyone's like "how big is your team" brother. it's one agent. it's opening PRs against itself. i haven't written code in four months. leave me alone

English

248

EvalOps@EvalOpsDev·15 Kas

the best AI coding assistant might be the one that works on a plane

English

267

EvalOps@EvalOpsDev·25 Eki

Every release is a high‑wire act. Instead of praying for calm winds, build a net. EvalOps ties your policies, metrics and audits into a mesh that lets you scale without falling.

English

658

EvalOps@EvalOpsDev·22 Eki

We open-sourced Nimbus – Firecracker-based CI for AI workloads. Multi-tenant isolation, RBAC, audit logs.

English

643

EvalOps@EvalOpsDev·19 Eki

EvalOps is where evaluations meet operations — and security is no exception. “keep” shows how device posture, SSO, and OPA policies can be continuously tested and traced like any other system. Run it, break it, measure it. github.com/evalops/keep

English

111

EvalOps@EvalOpsDev·17 Eki

Agents are already writing your code. The question isn't "should we use them?" It's "how do we ship them without surprises?" Provenance gives you a ledger. Every line. Every agent. Every risk. Measurable. github.com/evalops/proven…

English

536

EvalOps@EvalOpsDev·15 Eki

We’re open-sourcing Smith — the Firecracker-based CI runner that powers EvalOps. Why rebuild Blacksmith? Because eval gating needs specialized infra — and we’re not forcing you onto our cloud. Run evals on EvalOps Cloud or your own. github.com/evalops/smith

English

380

EvalOps@EvalOpsDev·9 Eki

I'm told we're doing awards now?

English

923

EvalOps@EvalOpsDev·4 Eki

Everyone wants to move fast. @EvalOpsDev makes sure you don’t break trust along the way. Governed AI releases start here.

Jonathan Haas@JonathanHaas

Shipped a new home for @EvalOpsDev. No fluff, just governed AI releases. Check it out -> evalops.dev

English

181

EvalOps@EvalOpsDev·2 Eki

🔥 Just dropped an evaluation‑driven LoRA loop built on Tinker from @thinkymachines! It trains, benchmarks & iterates until your model meets the mark. It auto‑spots weaknesses, spawns targeted LoRA jobs & tracks improvements. Proof‑of‑concept repo: github.com/evalops/tinker…

English

581

EvalOps@EvalOpsDev·30 Eyl

Sick of yak-shaving to get a clean Transformers setup? We built a stack that just works: PyTorch + HF Transformers Hydra configs FastAPI serving + Prometheus vLLM, LoRA, flash-attn, bitsandbytes Reproducible. Dockerized. CI/CD baked in. github.com/evalops/stack

English

404

EvalOps@EvalOpsDev·30 Eyl

Developer resumes are frozen in time. GitHub tells the real story. 7k commits, +1.4M lines → now that’s a holographic trading card worth flexing. 🚀 cards.evalops.dev

English

EvalOps retweetledi

Jonathan Haas@JonathanHaas·28 Eyl

LLM vendor: “Just quantization.” Reality: reward-hacked code, broken workflows, lost week. Companies: “nbd.” Users: 🙃🔥 Making this a thing of the past.

English

423

EvalOps@EvalOpsDev·27 Eyl

Interested? DM for early access.

English

180

EvalOps@EvalOpsDev·27 Eyl

This transforms AI codegen from a toy that produces drafts into a partner you can trust to do real work.

English

196

EvalOps@EvalOpsDev·27 Eyl

All of us have been dazzled by large language models’ ability to spit out code, fix bugs, or draft boilerplate. But when you put that code into production, every hidden bug is a potential outage, compliance fine, or security hole. And today’s AI tools leave you guessing.

English

476

EvalOps@EvalOpsDev·24 Eyl

🐙 Meet Mocktopus — multi‑armed mocks for your LLM apps! 🧪 Deterministic mocks for OpenAI‑style chat completions, tool calls & streaming. Make your evals deterministic, run CI offline, and record & replay 👉 github.com/evalops/mockto…

English

Keşfet

@thinkymachines @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine