Lan Jiang

306 posts

Lan Jiang

Lan Jiang

@lanjiang653

@lux_capital

Katılım Kasım 2022
298 Takip Edilen588 Takipçiler
Lan Jiang retweetledi
Catalyst
Catalyst@CatalystLabsX·
Introducing Catalyst, the agent layer for all of finance. Turn any natural language idea into a live strategy: research, backtesting & execution. Don’t get left behind. Waitlist open, join now.
English
120
155
1.2K
390.4K
Lan Jiang retweetledi
Lan Jiang retweetledi
Yash Patil
Yash Patil@ypatil125·
The @cognition team ships like no other team i've seen before. Incredible velocity and always ahead of the curve on new developer paradigms. Love how well designed everything in the new version of Devin is! Crushed it @ScottWu46 and team 👏👏👏!
Cognition@cognition

Introducing Devin 2.2 – the autonomous agent that can test with computer use, self-verify, and auto-fix its work. Try it for free! We’ve also overhauled Devin from the ground up: - 3x faster startup - fully redesigned interface - computer use + virtual desktop ...and hundreds more UX and functionality improvements.

English
0
3
72
7.9K
Lan Jiang retweetledi
Linden Li
Linden Li@lindensli·
A tool-calling model with access to just a filesystem can also learn how to perform non-technical tasks (in this case, legal tasks) outside of software engineering. With a well-crafted grader that nudges the model in the right direction, a model can cleanly learn how to pick up new skills, from how to use the PDF parsing tools effectively to navigating legal documents, really drawing out latent capabilities learned from pretraining. It's been great working with @mercor_ai on a state-of-the-art legal model! A big part of our recipe is high quality environments to train on.
Applied Compute@appliedcompute

We partnered with @mercor_ai to post-train custom models on high-quality expert data from fields like law, investment banking, and consulting. Our latest model ranks #1 on the APEX-Agents leaderboard in corporate law and #4 overall. Domain-specific post-training on high-quality, organization-specific data can systematically close the gap between general AI competence and expert-level reliability, making capable enterprise agents practical and affordable for knowledge-intensive industries. appliedcompute.com/case-studies/m…

English
3
9
118
20K
Lan Jiang retweetledi
Brendan (can/do)
Brendan (can/do)@BrendanFoody·
.@appliedcompute achieving frontier capabilities on APEX Agents with just 2,000 tasks is incredible. Their model can produce complex legal deliverables, redlines, and slide decks. It feels like RL is becoming so powerful that it can quickly saturate any benchmark. The barrier to applying agents to the entire economy is building evals for everything. great work @ypatil125 @rhythmrg @lindensli
Mercor@mercor_ai

Scaling Data leads to SOTA Legal Performance on APEX-Agents @appliedcompute built a custom model (Applied Compute: Small) by post-training GLM 4.7 on nearly 2,000 samples provided by Mercor. It is now top of the APEX-Agents leaderboard in corporate law, with a Pass@1 score of 26.6% and a mean score of 54.8%. Here’s what we learnt 👇

English
4
9
73
22.2K
Lan Jiang retweetledi
Cognition
Cognition@cognition·
Introducing Devin 2.2 – the autonomous agent that can test with computer use, self-verify, and auto-fix its work. Try it for free! We’ve also overhauled Devin from the ground up: - 3x faster startup - fully redesigned interface - computer use + virtual desktop ...and hundreds more UX and functionality improvements.
English
93
178
1.5K
511.1K
Lan Jiang retweetledi
Andy Fang
Andy Fang@andyfang·
DoorDash has learned a lot about shipping impactful AI products through our partnership with @ypatil125 @rhythmrg @lindensli at the @appliedcompute team. We're already seeing additional traction collaborating them in other use cases we hope to share soon.
Yash Patil@ypatil125

Excited to finally share this! It was an amazing collaboration with @andyfang and the @DoorDash team! We’re thrilled to continue partnering with one of the most innovative and execution-focused AI teams in the world.

English
3
8
51
17.6K
Lan Jiang retweetledi
Rhythm Garg
Rhythm Garg@rhythmrg·
It will be increasingly important for enterprises to have crisp, automated definitions of what is "good" and "bad" when building agents for a task. These are what are ultimately used as the north star when improving agents (both via prompting and training) on the task. DoorDash is leading the charge here; we worked with them and their internal human experts to build an automated grader for an important workflow in merchant onboarding, and then used that artifact to build a useful agent together.
Applied Compute@appliedcompute

We partnered with @DoorDash to train a proprietary RL-powered agent that encodes internal QA standards into an automated grader, turning expert judgment into a scalable training signal. The result: a 30% relative reduction in critical menu errors and a production system now live across all US menu traffic. appliedcompute.com/case-studies/d…

English
1
5
63
11.2K
Lan Jiang retweetledi
Linden Li
Linden Li@lindensli·
@Doordash has been a great partner in shipping our merchant onboarding model into production to all menu traffic in the US. The fun part was having an open canvas to calibrate our reward and grader to human preferences, specifically those of experts who deeply understand DoorDash's specific definition of quality. Once that was done, we just let our optimization stack loose to keep hill-climbing performance on autopilot.
Applied Compute@appliedcompute

We partnered with @DoorDash to train a proprietary RL-powered agent that encodes internal QA standards into an automated grader, turning expert judgment into a scalable training signal. The result: a 30% relative reduction in critical menu errors and a production system now live across all US menu traffic. appliedcompute.com/case-studies/d…

English
1
2
35
10.7K
Lan Jiang retweetledi
Applied Compute
Applied Compute@appliedcompute·
We partnered with @DoorDash to train a proprietary RL-powered agent that encodes internal QA standards into an automated grader, turning expert judgment into a scalable training signal. The result: a 30% relative reduction in critical menu errors and a production system now live across all US menu traffic. appliedcompute.com/case-studies/d…
English
0
14
157
70.7K
Lan Jiang retweetledi
Yash Patil
Yash Patil@ypatil125·
Excited to finally share this! It was an amazing collaboration with @andyfang and the @DoorDash team! We’re thrilled to continue partnering with one of the most innovative and execution-focused AI teams in the world.
Applied Compute@appliedcompute

We partnered with @DoorDash to train a proprietary RL-powered agent that encodes internal QA standards into an automated grader, turning expert judgment into a scalable training signal. The result: a 30% relative reduction in critical menu errors and a production system now live across all US menu traffic. appliedcompute.com/case-studies/d…

English
4
7
139
34K
Lan Jiang retweetledi
Rhythm Garg
Rhythm Garg@rhythmrg·
When improving agentic systems on real-world tasks, the quality and trustworthiness of the data is often much more important than the quantity (past reasonable thresholds). It was awesome working with Mercor for precisely that reason – their data captures economically valuable domains like corporate law and is excellent. And it's the same ethos that we bring to our engagements with enterprise customers, where we use tools to first build a high-quality internal dataset with the customer before thinking about hill climbing.
Brendan (can/do)@BrendanFoody

.@appliedcompute improved 19% on Corporate Law tasks in APEX Agents. Their model traverses data rooms with hundreds of files to prepare complex legal deliverables. This level of model improvement with just 1000 tasks is incredible and just the beginning. Great work @ypatil125, @rhythmrg, and @lindensli

English
1
3
27
3.6K
Lan Jiang retweetledi
Linden Li
Linden Li@lindensli·
It’s been great getting to work with @mercor_ai on Apex Agents! Well-crafted harnesses that (1) resemble work in the real world and (2) agents can learn to do work in can elicit some pretty powerful capabilities!
Brendan (can/do)@BrendanFoody

.@appliedcompute improved 19% on Corporate Law tasks in APEX Agents. Their model traverses data rooms with hundreds of files to prepare complex legal deliverables. This level of model improvement with just 1000 tasks is incredible and just the beginning. Great work @ypatil125, @rhythmrg, and @lindensli

English
0
1
22
2K
Lan Jiang retweetledi
Yash Patil
Yash Patil@ypatil125·
Agentic co-workers are on the horizon!! It was awesome to work with @BrendanFoody and the rest of the Mercor team on this! Unlocking the true economic potential of AI necessitates learning from domain experts and how work is done in the real world.
Brendan (can/do)@BrendanFoody

.@appliedcompute improved 19% on Corporate Law tasks in APEX Agents. Their model traverses data rooms with hundreds of files to prepare complex legal deliverables. This level of model improvement with just 1000 tasks is incredible and just the beginning. Great work @ypatil125, @rhythmrg, and @lindensli

English
3
3
87
10.9K
Lan Jiang retweetledi
Applied Compute
Applied Compute@appliedcompute·
We partnered with @mercor_ai to train knowledge work agents across law, banking, and consulting. Our approach used RL on fewer than 1K expert-authored tasks, full-trajectory logging, and behavioral analysis to pinpoint where models succeeded or failed. What made the difference was closing the feedback loop. Each training run surfaced what the data was actually teaching, where progress stalled, and what to collect next. Fast iteration prevented wasted time and compute on the wrong data. The result is both a better model and a system that improves with use, with learning curves that keep climbing across all three domains.
Mercor@mercor_ai

x.com/i/article/2016…

English
8
8
144
29K
Lan Jiang retweetledi
Kushal Thaman
Kushal Thaman@kushal1t·
The data wall is massive and incredibly durable. We are going to fly over it. Today, I'm glad to announce that I've joined Flapping Airplanes, a foundational AI research lab whose singular mission is to solve the data efficiency problem. Prepare for liftoff!
Kushal Thaman tweet media
Flapping Airplanes@flappyairplanes

Announcing Flapping Airplanes! We’ve raised $180M from GV, Sequoia, and Index to assemble a new guard in AI: one that imagines a world where models can think at human level without ingesting half the internet.

English
18
7
100
20K
Lan Jiang retweetledi
Flapping Airplanes
Flapping Airplanes@flappyairplanes·
Announcing Flapping Airplanes! We’ve raised $180M from GV, Sequoia, and Index to assemble a new guard in AI: one that imagines a world where models can think at human level without ingesting half the internet.
GIF
English
339
259
3.6K
2.1M