Aksh Garg

234 posts

Aksh Garg

Aksh Garg

@AkshGarg03

@mercor_ai, CS @stanford | ex @point72, @tesla, @spacex, @deshaw

Katılım Ocak 2022
301 Takip Edilen1.6K Takipçiler
Sabitlenmiş Tweet
Aksh Garg
Aksh Garg@AkshGarg03·
(1/5) @CKT_Conner, @dill_pkl, @emilyzsh, and I are excited to introduce Shard - a proof-of-concept for an infinitely scalable distributed system composed of consumer hardware for training and running ML models! Features: - Data + Pipeline Parallel for handling arbitrarily large models - Algorithmic load balancing for throughput optimization - Fault tolerance for unreliable machines
English
22
27
205
85.5K
Aksh Garg
Aksh Garg@AkshGarg03·
there's so much to software beyond just code - excited to see us taking more steps towards automating the full cycle shoutout @AbhiKottamasu and @adarsh_exe can't wait for v2
adarsh@adarsh_exe

Traditional coding benchmarks do not reflect how software is actually built and maintained. That's why we built a new benchmark, APEX-SWE, in partnership with @cognition. It measures whether AI models can perform complex, real-world software engineering work to ship systems that work and debug them when they don't. @OpenAI GPT 5.3 Codex (High) tops the leaderboard at 41.5% on Pass@1.

English
0
0
11
1.5K
Aksh Garg retweetledi
adarsh
adarsh@adarsh_exe·
Traditional coding benchmarks do not reflect how software is actually built and maintained. That's why we built a new benchmark, APEX-SWE, in partnership with @cognition. It measures whether AI models can perform complex, real-world software engineering work to ship systems that work and debug them when they don't. @OpenAI GPT 5.3 Codex (High) tops the leaderboard at 41.5% on Pass@1.
English
123
152
878
184.9K
Rallies Arena
Rallies Arena@ralliesarena·
@AkshGarg03 Because this is a live experiment, there's no past data involved.
English
2
0
6
24.8K
Rallies Arena
Rallies Arena@ralliesarena·
CLAUDE IS DESTROYING THE S&P 500 Claude is not only good at coding but it's also good at investing?!?! We gave 8 different AI models $100K and let them loose in the stock market starting from the end of November ... and Claude is in first place Claude is beating all the other models and the S&P 500 Claude: +8.7%🟢 S&P 500: +1.9%🟢
Rallies Arena tweet media
English
174
274
3.6K
1.6M
Aksh Garg
Aksh Garg@AkshGarg03·
started 2026 resolving to read more papers… …and then spent Sunday building a research agent that ranks top AI papers + explains why they matter in 10 minutes first one’s live: akshgarg07.substack.com/p/daily-digest… laziness remains an engineer’s best friend
English
1
0
15
1.5K
Robby Manihani
Robby Manihani@robbymanihani·
Agents in production don't fail because of bad models. They fail because of bad context. Everyone's focused on expanding and leveraging context windows. But bigger windows ≠ better focus - models still have limited working memory. You can't just dump everything in. The bottleneck is figuring out which information actually matters before it ever hits the model. This summer I built the COR engine @pacecom to solve this, an AI-native BPO for insurance carriers now in production serving leading insurance carriers such as Prudential, NewFront, and The Mutual Group. It reads documents the way humans do - hierarchical structure, context, what governs what. The result with the same models? An increase from 70% to 95%+ accuracy in production agents.
Robby Manihani tweet media
English
6
6
25
2.6K
Brendan (can/do)
Brendan (can/do)@BrendanFoody·
Mercor (@mercor_ai) scaled from $1-500M in revenue run rate in the last 17 months, making us the fastest growing company of all time. Our growth is accelerating. We averaged 11% week over week growth in July, 18% WoW growth in August, and 19% WoW growth in September. One trend driving this meteoric growth: the Economy is Becoming an RL Environment Machine. Reinforcement learning is becoming so effective that agents can hillclimb any benchmark, but humans need to define the rewards to automate everything. While everyone fears job loss, we’re creating a new category of knowledge work faster than any other time in history. The future of work will converge on training agents. We're paying out over $1M / day to people in our marketplace and hiring experts rapidly across nearly every domain: software engineers, doctors, lawyers, consultants, bankers, and many more.
English
149
196
1.5K
613.9K
Om Londhe
Om Londhe@omlondhe2133·
1,000 at last!
Om Londhe tweet media
English
12
0
57
3.3K
Aksh Garg retweetledi
Brendan McLaughlin
Brendan McLaughlin@brendanm0407·
Thrilled to share that I’ve joined @reflection_ai! We’re building superintelligent autonomous systems by co-designing research and product. Today, we’re launching Asimov. As AI benchmarks saturate, evaluation will increasingly live inside real-world products that are reward-bearing RL environments in disguise. Reach out if you’re interested in working at the intersection of RL, LLMs, and agents alongside some of the scientists and engineers behind AlphaGo, PaLM, GPT-4, Gemini!
Misha Laskin@MishaLaskin

Engineers spend 70% of their time understanding code, not writing it. That’s why we built Asimov at @reflection_ai. The best-in-class code research agent, built for teams and organizations.

English
15
9
72
15.3K
Aksh Garg retweetledi
Aksh Garg retweetledi
Sahil Adhawade
Sahil Adhawade@sahiladhawade·
Crosby is rewriting an entire industry. We are a law firm powered by modern software and AI, and we have barely scratched the surface. On a personal note, I am pumped to be joining the @CrosbyLegal team with @Ryanjdaniels and @jsarihan! Check us out: crosby.ai
John Sarihan@jsarihan

Today we’re introducing Crosby, a hybrid AI law firm that helps rapidly growing businesses execute faster.  Contracts are connection points. They allow companies to transact with one another and create economic growth. But while every aspect of business has sped up, the way we negotiate contracts hasn’t changed in 50 years.  Crosby is building the API for human agreement. We combine the speed and intelligence of AI with the safety of lawyers-in-the-loop to review contracts in under an hour. Since quietly launching in January, we’ve reviewed over 1,000 MSAs, DPAs and NDAs for some of the fastest growing companies in history, including Cursor, Clay and UnifyGTM. Speed to execution is our north star, and today our median review time is 58 minutes. GTM teams call Crosby a secret weapon to close deals 80% faster. We’re just getting started.  Today, we’re also excited to share that we’ve raised $5.8m from Sequoia Capital and Bain Capital Ventures, as well as the founders of Ramp, Instacart, Flatiron Health, and others. Crosby is a small, talent-dense team, combining lawyers from Harvard, Stanford, and Columbia Law with engineers from Ramp, Vanta, Meta, and Google. Every engineer on our team today is a former founder. We work in person in New York City.  If our mission resonates with you, we are looking for technologists, legal experts, and former founders to join us.  For high-growth companies looking to execute faster, we’ve opened up Early Access. Sign up on our website and we’ll be in touch.

English
5
2
19
3.5K
Aksh Garg retweetledi
Valerio Pepe
Valerio Pepe@ValerPepe·
New blog post with @ArmaanTip! Following Emergent Misalignment, we show that finetuning even a single layer via LoRA on insecure code can induce toxic outputs in Qwen2.5-Coder-32B-Instruct, and that you can extract steering vectors to make the base model similarly misaligned 🧵
English
1
3
15
1.3K
Aksh Garg retweetledi
Y Combinator
Y Combinator@ycombinator·
Willow (@WillowVoiceAI) is the future of voice-first computing. Thousands of people have replaced their keyboards with Willow to get emails, docs, and AI prompting workflows done 4x faster. Congrats on the launch, @_allanguo and @LiuLawrence45!
English
134
113
757
293K