
Calaveras AI
26 posts

Calaveras AI
@CalaverasAI
Over 100 B novel code tokens and realistic+robust RL envs
San Francisco Katılım Ağustos 2025
129 Takip Edilen113 Takipçiler
Sabitlenmiş Tweet


there is a lot of alpha in actually looking at the data
Epoch AI@EpochAIResearch
We looked at OSWorld, a popular evaluation of AI computer use capabilities. Our findings: tasks are simple, many don't require GUIs, and success often hinges on interpreting ambiguous instructions. The benchmark is also not stable over time. See thread for details!
English


Meta has gone crazy on the squid game!
Many new PhD NGs are deactivated today
(I am also impacted🥲 happy to chat)
Yuandong Tian@tydsh
Several of my team members + myself are impacted by this layoff today. Welcome to connect :)
English

i will be at COLM in montreal next week presenting the breakpoint eval with @KaivuHariharan
Would love to meet and chat with people, some things I'm interested in right now:
- human-in-the-loop training algorithms
- scalable oversight
- automated auditing
English

@nikolaj2030 @Miles_Brundage it's unreal how much better the Inspect codebase is than other open source evals frameworks
English

@Miles_Brundage Consider using Inspect (adopting inspect early in the process of making an eval rather than later saves so much headache later on) inspect.aisi.org.uk
English

@CalaverasAI @joodalooped the first is holy based. congratulations
English


@tkskri_kypr おめでとうございます!もしAI業界をリードする企業でRL環境の未来を築くことに興味がある方がいらっしゃいましたら、DMをください。ランチをご馳走させていただきます ^_^
日本語

I've been interviewing so many impressive programmers who deeply understand the systems they work with, are driven, and have grit.
We work on hard problems, pay good comp, and have an exciting mission.
If you think you'd like it here, please DM me!
Zy@ZyMazza
U should be required to poast the salary you’re offering when you say stuff like this
English











