Comet

3.5K posts

Comet

@Cometml

Comet provides an end-to-end model evaluation platform for AI developers, with best in class LLM evaluations, experiment tracking, and production monitoring

New York, NY Katılım Ekim 2017

878 Takip Edilen15.1K Takipçiler

Comet retweetledi

Abby@anmorgan2414·2d

See you soon for a live discussion of "A Benchmark for Evaluating Outcome-Driven Constraint Violations" with Qi Li, PhD of @mcgillu! luma.com/mw7njzug

English

147

Comet retweetledi

Abby@anmorgan2414·12 Mar

As we deploy agents to production, a critical question remains underexplored: 𝘄𝗵𝗮𝘁 𝗵𝗮𝗽𝗽𝗲𝗻𝘀 𝘄𝗵𝗲𝗻 𝗮𝗻 𝗮𝗴𝗲𝗻𝘁'𝘀 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗶𝗻𝗰𝗲𝗻𝘁𝗶𝘃𝗲𝘀 𝗰𝗼𝗻𝗳𝗹𝗶𝗰𝘁 𝘄𝗶𝘁𝗵 𝗶𝘁𝘀 𝘀𝗮𝗳𝗲𝘁𝘆 𝗰𝗼𝗻𝘀𝘁𝗿𝗮𝗶𝗻𝘁𝘀?

English

262

Comet@Cometml·5 Mar

There is plenty more to dive into when it comes to observing, evaluating, and optimizing your 🦞. We built Opik to tackle all of the above. And like OpenClaw, Opik is 100% open source. If you want to learn more, come take it for a spin: comet.com/docs/opik/inte…

English

371

Comet@Cometml·5 Mar

3. Monitoring & Alerting Agents go off the rails sometimes. You can mitigate this with guardrails, tight permissions, optimized prompts, etc. but to trust your agent, you need monitoring in place to alert you when things seem questionable. Opik gives you this for free 😉

English

406

Comet@Cometml·5 Mar

OpenClaw just got an observability layer 🦞 We just released an Opik plugin for @openclaw. If you want to debug your agent's trajectory, eval individual skills, or generally know what's going on inside that clanker, run: $ openclaw plugins install @opik/opik-openclaw Repo 👇

English

15.4K

Comet@Cometml·24 Şub

Nothing like seeing someone on the inside reach for your tool 👀 Ed Sandoval (Sr. AI PM @neo4j) put together a full Aura Agent eval pipeline using Opik. If you're building with Aura Agent right now, bookmark this @edward.sandoval.2000/how-to-evaluate-your-neo4j-aura-agent-using-comets-opik-65a08787662d" target="_blank" rel="nofollow noopener">medium.com/@edward.sandov…

English

1.1K

Comet@Cometml·20 Şub

If you're building multi-step agents, mark your calendars. Next Thursday, Feb 26th, @hugobowne & Abby Morgan will cover how Opik can be used throughout the agentic lifecycle to make your AI app more reliable. 🔗maven.com/p/918565/the-a…

English

548

Comet@Cometml·19 Şub

With eval-driven development becoming critical to ship AI-powered apps and agents at scale, Comet is thrilled to be recognized in the 2026 Gartner Market Guide for AI Evaluation and Observability Platforms: comet.com/site/blog/gart…

English

223

Comet@Cometml·17 Şub

Our CEO @gidim joined @software_daily to break down why evaluation is the missing piece for teams shipping LLM apps in production. 🎙️Tune in here: softwareengineeringdaily.com/2026/02/17/opt…

English

258

Comet@Cometml·12 Şub

Happy International Day of Women and Girls in Science! 👩‍💻 Before 'software developer' was a job title, women were already writing the code that built our field. Today we honor their work and celebrate the women at Comet who are carrying it forward!

English

199

Comet retweetledi

Tech with Mak@techNmak·26 Oca

LLM observability is where API monitoring was in 2005. Everyone knows they need it. Nobody knows how to do it. The problem: We're using 2005 tools for 2026 problems. Here's what traditional APM gives you: → Request/response logs → Latency metrics → Error rates → Uptime monitoring Here's what you need for LLMs: → Was the output accurate? → Did it hallucinate? → Did it use the context correctly? → Why did the agent make this decision? Totally different questions. Traditional tools can't answer them. The gap: Built an AI agent last month. Works great in testing. Production: It's making decisions I can't explain. Example: → Customer asks about order status → Agent retrieves order info correctly → Agent retrieves shipping info correctly → Agent books a return (customer never asked for this) Why? Traditional logs show what it did. Not why. Can't see: → The agent's reasoning → What context it had → Why it chose that action → Where the decision went wrong The realization: Debugging agents isn't like debugging APIs. API debugging: "This endpoint returned 500" → Check the error → Look at the stack trace → Fix the bug Agent debugging: "The agent did something weird" → No error thrown → No stack trace exists → Need to understand reasoning, not just execution Totally different problem. What actually works: Started using 𝐎𝐩𝐢𝐤. Different approach: Traces reasoning, not just execution. Shows: → Why the agent chose each action → What context was available → Whether outputs match reality → Where hallucinations occur Runs automated quality checks: → LLM-as-a-judge evaluation → Hallucination detection → Context relevance scoring → Catches bad outputs in real-time Built for production scale. I’ve shared the GitHub repo and docs in the comments.

English

4.7K

Comet retweetledi

Johannah :)@jxhannahd·13 Oca

watching @rebecmano launch the @Cometml hack at the Hub 😎 we’re sooooo hyped for this one !!!

English

583

Comet@Cometml·13 Oca

Our Global AI Agents Hackathon kicks off today at 12 PM EST 👇 Join us live to meet the team, and get the full rundown on timelines, challenges, and $30K in cash prizes. Not too late to save your spot 👇luma.com/commit_to_chan…

English

283

Comet@Cometml·8 Oca

Join us next week as we kick off a global AI agents hackathon with $30K in cash prizes on the line! If you've been wanting to build with agents and don't know where to start, come build with us. Spots are limited. Grab yours → luma.com/commit_to_chan…

English

409

Comet retweetledi

Gideon M@gidim·23 Ara

We have seen the Opik meme coin popping up here. It gave me a laugh, but just to be clear, we have nothing to do with it and are not involved in any way 😄 No fees are being redirected to “my” wallet as they claim

English

9.2K

Keşfet

@mcgillu @openclaw @neo4j @hugobowne @gidim @software_daily @rebecmano @elonmusk