shin
5.8K posts

shin
@shfunc
eng @hud_evals | researcher, 進慎




さんざん無駄と叩かれた日本の道路補修の公共事業に携わる作業員の手腕に、なぜか海外で「イギリスでこんな丁寧な仕事をやるなら35億ポンド、5つの自治体と資金援助、そして25年の時間がかかるだろう」とバズっている


Aviro is introducing Ebla, a state of the art grounded reasoning model. In collaboration with HUD, the Aviro team built C⁴ — a benchmark for long-horizon tasks in corporate document sets. We evaluate four dimensions: Correctness, Completeness, Composition, and Citations. @aviro_ai post-trained GPT-OSS 120b to achieve SOTA performance, with a Pass@1 score of 25.4% and Pass@8 score of 37.1%.





REAL ESTATE AGENTS — KISS YOUR JOBS GOODBYE. We built Zentro, a 24/7 real estate web agent that works for you to find your dream home. Stop chasing listings. Let them chase you. Built with @browser_use + @convex + @agentmail + @supermemory + @hud_ai

About half-way into this chat Boris and I realized we were both from same city in Ukraine 🤯






