Comet

3.5K posts

Comet banner
Comet

Comet

@Cometml

Comet provides an end-to-end model evaluation platform for AI developers, with best in class LLM evaluations, experiment tracking, and production monitoring

New York, NY Katılım Ekim 2017
878 Takip Edilen15.1K Takipçiler
Comet retweetledi
Abby
Abby@anmorgan2414·
See you soon for a live discussion of "A Benchmark for Evaluating Outcome-Driven Constraint Violations" with Qi Li, PhD of @mcgillu! luma.com/mw7njzug
English
1
1
3
147
Comet retweetledi
Abby
Abby@anmorgan2414·
As we deploy agents to production, a critical question remains underexplored: 𝘄𝗵𝗮𝘁 𝗵𝗮𝗽𝗽𝗲𝗻𝘀 𝘄𝗵𝗲𝗻 𝗮𝗻 𝗮𝗴𝗲𝗻𝘁'𝘀 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗶𝗻𝗰𝗲𝗻𝘁𝗶𝘃𝗲𝘀 𝗰𝗼𝗻𝗳𝗹𝗶𝗰𝘁 𝘄𝗶𝘁𝗵 𝗶𝘁𝘀 𝘀𝗮𝗳𝗲𝘁𝘆 𝗰𝗼𝗻𝘀𝘁𝗿𝗮𝗶𝗻𝘁𝘀?
English
1
1
1
262
Comet
Comet@Cometml·
There is plenty more to dive into when it comes to observing, evaluating, and optimizing your 🦞. We built Opik to tackle all of the above. And like OpenClaw, Opik is 100% open source. If you want to learn more, come take it for a spin: comet.com/docs/opik/inte…
English
1
0
3
371
Comet
Comet@Cometml·
3. Monitoring & Alerting Agents go off the rails sometimes. You can mitigate this with guardrails, tight permissions, optimized prompts, etc. but to trust your agent, you need monitoring in place to alert you when things seem questionable. Opik gives you this for free 😉
English
1
0
3
406
Comet
Comet@Cometml·
OpenClaw just got an observability layer 🦞 We just released an Opik plugin for @openclaw. If you want to debug your agent's trajectory, eval individual skills, or generally know what's going on inside that clanker, run: $ openclaw plugins install @opik/opik-openclaw Repo 👇
Comet tweet media
English
5
10
44
15.4K
Comet
Comet@Cometml·
Nothing like seeing someone on the inside reach for your tool 👀 Ed Sandoval (Sr. AI PM @neo4j) put together a full Aura Agent eval pipeline using Opik. If you're building with Aura Agent right now, bookmark this @edward.sandoval.2000/how-to-evaluate-your-neo4j-aura-agent-using-comets-opik-65a08787662d" target="_blank" rel="nofollow noopener">medium.com/@edward.sandov…
English
0
1
8
1.1K
Comet
Comet@Cometml·
If you're building multi-step agents, mark your calendars. Next Thursday, Feb 26th, @hugobowne & Abby Morgan will cover how Opik can be used throughout the agentic lifecycle to make your AI app more reliable. 🔗maven.com/p/918565/the-a…
Comet tweet media
English
2
4
6
548
Comet
Comet@Cometml·
With eval-driven development becoming critical to ship AI-powered apps and agents at scale, Comet is thrilled to be recognized in the 2026 Gartner Market Guide for AI Evaluation and Observability Platforms: comet.com/site/blog/gart…
Comet tweet media
English
0
0
3
223
Comet
Comet@Cometml·
Happy International Day of Women and Girls in Science! 👩‍💻 Before 'software developer' was a job title, women were already writing the code that built our field. Today we honor their work and celebrate the women at Comet who are carrying it forward!
Comet tweet media
English
1
1
3
199
Comet retweetledi
Tech with Mak
Tech with Mak@techNmak·
LLM observability is where API monitoring was in 2005. Everyone knows they need it. Nobody knows how to do it. The problem: We're using 2005 tools for 2026 problems. Here's what traditional APM gives you: → Request/response logs → Latency metrics → Error rates → Uptime monitoring Here's what you need for LLMs: → Was the output accurate? → Did it hallucinate? → Did it use the context correctly? → Why did the agent make this decision? Totally different questions. Traditional tools can't answer them. The gap: Built an AI agent last month. Works great in testing. Production: It's making decisions I can't explain. Example: → Customer asks about order status → Agent retrieves order info correctly → Agent retrieves shipping info correctly → Agent books a return (customer never asked for this) Why? Traditional logs show what it did. Not why. Can't see: → The agent's reasoning → What context it had → Why it chose that action → Where the decision went wrong The realization: Debugging agents isn't like debugging APIs. API debugging: "This endpoint returned 500" → Check the error → Look at the stack trace → Fix the bug Agent debugging: "The agent did something weird" → No error thrown → No stack trace exists → Need to understand reasoning, not just execution Totally different problem. What actually works: Started using 𝐎𝐩𝐢𝐤. Different approach: Traces reasoning, not just execution. Shows: → Why the agent chose each action → What context was available → Whether outputs match reality → Where hallucinations occur Runs automated quality checks: → LLM-as-a-judge evaluation → Hallucination detection → Context relevance scoring → Catches bad outputs in real-time Built for production scale. I’ve shared the GitHub repo and docs in the comments.
Tech with Mak tweet media
English
11
10
84
4.7K
Comet retweetledi
Johannah :)
Johannah :)@jxhannahd·
watching @rebecmano launch the @Cometml hack at the Hub 😎 we’re sooooo hyped for this one !!!
Johannah :) tweet media
English
0
3
6
583
Comet
Comet@Cometml·
Our Global AI Agents Hackathon kicks off today at 12 PM EST 👇 Join us live to meet the team, and get the full rundown on timelines, challenges, and $30K in cash prizes. Not too late to save your spot 👇luma.com/commit_to_chan…
English
0
0
3
283
Comet
Comet@Cometml·
Join us next week as we kick off a global AI agents hackathon with $30K in cash prizes on the line! If you've been wanting to build with agents and don't know where to start, come build with us. Spots are limited. Grab yours → luma.com/commit_to_chan…
Comet tweet media
English
0
1
6
409
Comet retweetledi
Gideon M
Gideon M@gidim·
We have seen the Opik meme coin popping up here. It gave me a laugh, but just to be clear, we have nothing to do with it and are not involved in any way 😄 No fees are being redirected to “my” wallet as they claim
Gideon M tweet media
English
27
3
28
9.2K