samarth @ ICLR

165 posts

samarth @ ICLR banner
samarth @ ICLR

samarth @ ICLR

@samarth__go

building @harvey // prev. hrt, imc, goldman sachs, avalanche

San Francisco, CA Katılım Mart 2022
424 Takip Edilen275 Takipçiler
Tu Trinh (Alina)
Tu Trinh (Alina)@thetututrain·
So thrilled to share HiL-Bench, a benchmark to measure how well agents can identify when they need to ask for help in SWE and SQL tasks. Spoiler alert: not too well yet!
Scale AI@scale_AI

New @ScaleAILabs Research: Your AI agent just gave you an answer but did it actually solve the problem, get lucky, or just sound right? Today’s benchmarks can’t tell. We built HiL-Bench (Human-in-Loop Benchmark) to test a critical skill: does your agent know what it’s missing and when to ask for clarification? 🧵

English
2
0
5
574
samarth @ ICLR
samarth @ ICLR@samarth__go·
Some unique challenges in getting this to happen: - editing 300 page docx files is very different from editing a python file - enabling 3 hour rollouts with 50+ work products created - agents searching over 10 million documents in a single run Many frontier problems to solve!
English
1
0
5
283
samarth @ ICLR retweetledi
Harvey
Harvey@harvey·
The best legal teams aren't using AI to replace lawyers’ time. They're using AI to reclaim it for judgment, strategy, and collaboration. AI agents run the workflows. Lawyers drive the outcomes. Harvey is the platform where both happen. Today we announced new funding led by GIC and Sequoia to scale the agents our customers run on Harvey and expand the legal engineering teams that help them turn expertise into systems. Read more: harvey.ai/blog/harvey-ra…
Harvey tweet media
English
16
37
158
121.7K
Carlson
Carlson@carlsoncheng_·
we just killed writer’s block. introducing nimbus, now you can write at the speed of thought. real time auto-research and compounding intelligence, no prompting required.
English
26
22
120
7.4K
samarth @ ICLR
samarth @ ICLR@samarth__go·
Excited to share what I've been building recently! We built a multi-agent pipeline to source, validate, and productionize legal infrastructure - scaling @harvey's knowledge sources from 6 to 60+ jurisdictions. Looking forward to pushing both coverage and quality even higher!
Harvey@harvey

In legal work, outcomes depend on the quality and coverage of your knowledge sources. In this blog, @samarth__go and Christopher Bello explain how we built the Data Factory to discover authoritative legal sources, validate them for compliance, and test real legal reasoning at scale. The result: Harvey scaled from six jurisdictions to 60+, and from 20 legal data sources to 400+. Full breakdown: harvey.ai/blog/using-age…

English
4
1
16
3.2K
signüll
signüll@signulll·
there is no one in ai today that has created any sort of meaningful network effects. ai today is ~single player & single player only. it is only multiplayer if you consider ai as another user which is entirely reasonable. multi player ai experiences are still mia & are incredibly fun to think about.
English
120
31
721
74.7K
Emir Karabeg
Emir Karabeg@emkara·
Announcing our $7M Series A led by @Standard_Cap Sim has gone from 0 to 60,000 developers in 5 months Sim v0.5 announcement below
English
147
80
827
165.4K
samarth @ ICLR
samarth @ ICLR@samarth__go·
Just had my paper "SAGE: A Realistic Benchmark for Semantic Understanding" accepted to the NeurIPS 2025 LLM-Eval workshop! Excited to push for more work exploring the capabilities of models in messy, real-world environments - see you soon San Diego
samarth @ ICLR tweet media
English
7
1
10
853
Karan Goel
Karan Goel@krandiash·
We've raised $100M from Kleiner Perkins, Index Ventures, Lightspeed, and NVIDIA. Today we're introducing Sonic-3 - the state-of-the-art model for realtime conversation. What makes Sonic-3 great: - Breakthrough naturalness - laughter and full emotional range - Lightning fast -
English
1.4K
1.2K
8.5K
4.9M
samarth @ ICLR
samarth @ ICLR@samarth__go·
@arb8020 The amount of documentation Claude writes and then never looks at again is absurd
English
0
0
2
55
arb8020
arb8020@arb8020·
wrote 2334 lines to IMPLEMENTATION_PLAN.md
English
2
0
6
473
Jlau
Jlau@JustinLau04·
this tier list is going around the cs careers discord a lot, how accurate is this
Jlau tweet media
English
351
486
8.4K
2.7M
Reducto
Reducto@reductoai·
Love free stuff? We’re giving away FIVE boxes of it stuffed with Reducto merch to celebrate our Series B! Each box comes with: - Our signature heavy graphic t-shirt - A Reducto yeti tumbler - Themed fortune cookies All you have to do is like, tag a friend, and be following our account to be entered. We’ll pick 5 random winners 1 week from now (10/22). Good luck!
Reducto tweet media
English
213
44
393
26K
samarth @ ICLR
samarth @ ICLR@samarth__go·
@kaylanhua I care about something and would like to vote with those who share my values ✋
English
1
0
2
120
kayla
kayla@kaylanhua·
Sway: influence elections, the right way. Everyone cares about something. Vote with those who share your values and make it happen.
English
39
23
145
336.1K
samarth @ ICLR
samarth @ ICLR@samarth__go·
@rox_ai Congrats on the launch 🥳 big things ahead 💪
English
0
0
1
150
Rox
Rox@rox_ai·
6 months, 25 million revenue agents & 3 trillion tokens later... Rox is now globally available 🌎 Just as coding agents 10x’d engineering, revenue agents 10x customer work. With Rox, humans are evolving to orchestrators while agents manage the end-to-end customer lifecycle. Even in Beta, Rox powered Global 2000 leaders in banking, hardware, construction, and sovereign AI, while serving dominant AI winners like @tryramp and @cognition. Rox delivers ROI in 90 days and is built with the best. Thank you to @OpenAI, @nvidia, @perplexity_ai, @awscloud, @vercel, @Snowflake, and @stripe for helping us scale.
English
93
87
654
366.7K
Ali Ansari
Ali Ansari@aliansarinik·
I’m excited to announce micro1 has raised a $35M Series A, valuing us at $500M. This round was led by 01A with @adambain joining our board of directors. We’re grateful to be partnering with leading AI Labs & fortune 10s, such as Microsoft, to train frontier LLMs. We’re just getting started building the infrastructure layer for AGI, with the ultimate goal of answering the very fundamental question: “where should humanity spend its time?”
English
253
248
1.5K
548.6K
Caelin
Caelin@caelin_sutch·
Unfortunately Christina and I have made the difficult decision to shutdown @lookbk_app and will be closing access in the coming weeks.
Caelin tweet media
English
9
1
51
4.2K