Himanshu Tyagi

2.3K posts

Himanshu Tyagi

@hstyagi

Researching and building general intelligence

Katılım Ocak 2015

586 Takip Edilen273K Takipçiler

Himanshu Tyagi@hstyagi·23 Nis

Evolving skills to hillclimb against benchmarks is a key module for self-evolving agents. Very excited for this new open repo from Sentient.

Sentient@SentientAGI

x.com/i/article/2047…

English

2.7K

Himanshu Tyagi retweetledi

Oleg Golev@oleg_golev·9 Mar

This is precisely why I'm excited about sentient.xyz/arena. The goal is to crowdsource as many different solutions as possible for the hardest AI reasoning challenges. The solutions space is so vast nowadays that we have to pursue large volume and evolutionary algorithms to help us explore in parallel

Andrej Karpathy@karpathy

The next step for autoresearch is that it has to be asynchronously massively collaborative for agents (think: SETI @home style). The goal is not to emulate a single PhD student, it's to emulate a research community of them. Current code synchronously grows a single thread of commits in a particular research direction. But the original repo is more of a seed, from which could sprout commits contributed by agents on all kinds of different research directions or for different compute platforms. Git(Hub) is *almost* but not really suited for this. It has a softly built in assumption of one "master" branch, which temporarily forks off into PRs just to merge back a bit later. I tried to prototype something super lightweight that could have a flavor of this, e.g. just a Discussion, written by my agent as a summary of its overnight run: github.com/karpathy/autor… Alternatively, a PR has the benefit of exact commits: github.com/karpathy/autor… but you'd never want to actually merge it... You'd just want to "adopt" and accumulate branches of commits. But even in this lightweight way, you could ask your agent to first read the Discussions/PRs using GitHub CLI for inspiration, and after its research is done, contribute a little "paper" of findings back. I'm not actually exactly sure what this should look like, but it's a big idea that is more general than just the autoresearch repo specifically. Agents can in principle easily juggle and collaborate on thousands of commits across arbitrary branch structures. Existing abstractions will accumulate stress as intelligence, attention and tenacity cease to be bottlenecks.

English

6.5K

Himanshu Tyagi retweetledi

Sentient@SentientAGI·5 Mar

Test EvoSkill on your own benchmarks: 👉 github.com/sentient-agi/E… Read the full technical report: 👉 alphaxiv.org/abs/2603.02766 👉 arxiv.org/abs/2603.02766 Read our technical blog authored by @salahalzubi401: 👉 sentient.xyz/blog/evoskill-…

English

7.1K

Himanshu Tyagi retweetledi

Sentient@SentientAGI·4 Mar

Applications are now live! Cohort 0 starts March 13th in Presidio with OpenHands, OpenRouter, alphaXiv, Fireworks, Dedalus Labs, Franklin Templeton, Founders Fund and Pantera. → $25K+ in prizes → 3 weeks building state-of-the-art AI agents → Many more surprises Apply below 👇

English

563

101

722

138.5K

Himanshu Tyagi@hstyagi·4 Mar

@tripathi_neil Wake me up when Claude can make better tikz images

English

2.4K

Neil Tripathi@tripathi_neil·4 Mar

Just submitted my first paper to arXiv, and I found something that fits the growing conversation around newer models hedging their bets more and more. VB: Visibility Benchmark - checks if vision-language models can apply common-sense reasoning to determine what's actually visible in a photo. Joint work with Ernest Davis at NYU. 9 models tested: GPT-5, GPT-4o, Gemini 3.1 Pro, Gemini 2.5 Pro, Claude Opus 4.5, Claude 3.7 Sonnet, Gemma 3 12B, InternVL3-8B, and Qwen3-VL-8B. 100 image families, 300 evaluation cells each.

English

14.5K

Himanshu Tyagi retweetledi

Sentient@SentientAGI·27 Şub

Today we are launching the next phase of AI reasoning development with Founders Fund, Franklin Templeton, Pantera Capital, Fireworks AI, OpenRouter, OpenHands, Dedalus Labs, alphaXiv, and more. AI is advancing at a relentless pace, but there are many reasoning capabilities we have yet to discover. Announcing Arena—an evaluation-driven platform for ideation, prototyping, and high-quality data generation—with top AI developers advancing SOTA performance on real-world enterprise reasoning tasks.

English

110

428

272.3K

Himanshu Tyagi@hstyagi·21 Şub

@tripathi_neil Reasoning, by definition, is whatever is out of distribution for a model.

English

592

Neil Tripathi@tripathi_neil·21 Şub

Had a great conversation with Professor Charles Elkan the other day about AI agents. One thing he said that stuck with me: the argument that “models will just get better and absorb everything” is actually a paradox. If we take that to its logical conclusion, there’s no point building anything on top of models, whether that’s orchestration, agents, or tooling. But obviously that’s not true. There’s real value in the systems we build around models, not just the models themselves.

English

934

Himanshu Tyagi retweetledi

Sentient@SentientAGI·31 Ara

Quick and nostalgic look of our work in 2025. See you all in 2026: the year of open-source reasoning.

English

231

665

86.8K

Himanshu Tyagi@hstyagi·12 Ara

There is more where this is coming from @iiscbangalore @artparkindia

South Park Commons India@spc_india

The first-ever deeptech demo night at SPC Bangalore, was stacked with some seriously cool builds! Here's a glimpse of how people are solving hard problems in hard-tech, from India. 🧵

English

6.5K

Himanshu Tyagi retweetledi

Oleg Golev@oleg_golev·11 Ara

Building a general-purpose AI agent with only open-source models is hard. Making it consistent, reliable, and fast enough for production usage is even harder. We at @SentientAGI have been optimizing both👇 Today we’re revealing SERA (Semantic Embeddings & Reasoning Agent): the AI architecture behind SERA-Crypto, our state-of-the-art agent for token research, DeFi analysis, and on-chain reasoning, combining 50+ APIs into market insights. 👉 #1 open-source agent on DMind, ahead of Perplexity Finance & Gemini, within ~2% of GPT-5 Medium on Web3 reasoning 👉 #1 on our live crypto benchmark (198 real user queries across 11 categories), beating GPT-5, Grok 4, Gemini 2.5 Pro, and Perplexity Finance More in 🧵

Sentient@SentientAGI

Announcing SERA-Crypto (Semantic Embedding & Reasoning Agent): our new reasoning architecture built for SOTA crypto research. #1 open-source agent on DMind #1 on our live crypto benchmark Outperforms GPT-5, Grok 4, Gemini 2.5 Pro, and Perplexity Finance…all under 45 seconds.

English

162

8.6K

Himanshu Tyagi@hstyagi·11 Ara

When you want fast reasoning, good old semantic similarity is not bad. Use it to setup your prompts dynamically, all the way to the right tool call. This is what we use for our live crypto knowledge agent which integrates search and about 10 different structured data APIs.

Sentient@SentientAGI

English

107

4.4K

Himanshu Tyagi@hstyagi·21 Kas

@bdguan It is such a beautiful subject. Only friends and parents have the patience to indulge in it. Schools are busy teaching.

English

511

brian@bdguan·18 Kas

all my life i've been told that i'm naturally gifted at math because i'm chinese. straight A student. math major at ucla. but here's what people don't know: when i was in 8th grade, i got a B+ in geometry. my dad said "that's unacceptable", bought a geometry textbook, and proceeded to assign me daily problems for 6 months. then i got good at geometry. i wasn't born gifted at math(ok maybe a little), i just grew up in an environment where being good at math was a requirement. this book is filled with extra math problems my dad assigned me. and i hated him for it. it took me 10+ years to realize how thankful i am that he pushed me like that. how that was simply his love language.

English

126

387

6.7K

1.5M

Himanshu Tyagi@hstyagi·13 Kas

@deedydas Yann was pretty famous in 2010 :D And yes Soumith is a legend!

English

6.5K

Deedy@deedydas·13 Kas

If you feel like giving up, you must read this never-before-shared story of the creator of PyTorch and ex-VP at Meta, Soumith Chintala. > from hyderabad public school, but bad at math > goes to a "tier 2" college in India, VIT in Vellore > rejected from all 12 universities for US masters despite 1420 on the GRE > fuckit.jpg > goes to the US anyway on a J-1 visa to CMU with no plan > applies for masters (again) to 15 universities > rejected from all except USC and with late admissions, NYU in 2010 > finds this guy called Yann LeCun (before he was famous) > starts getting into open source > rejected from all jobs including DeepMind > only job is Amazon as test engineer > his PhD mentor helps him get a job at a small startup (MuseAmi) > rejected from DeepMind > couldn't get H-1B because of J-1 home return issue; gets waiver through months of approval with USCIS and US State Dept > very low on confidence > In 2011/12 builds one of the fastest AI inference engines on phones > rejected from DeepMind > emailed Yann again and joins FAIR because of Torch7 open-source work > scrapes through bootcamp at Facebook, struggling on an HBase task > L8/L9 engineers at Facebook struggle to get ImageNet working > figures out numerics / hyperparam issue as an L4 > first big win! > FAIR goes well, runs 3 person torch7 team and co-creates PyTorch > because of politics, management wants to shut down PyTorch > cries-at-bar.jpg, literally > eventually some people save PyTorch and it launches in 2017 > gets a EB-1 green card! > the rest is history... Think about that. He went to a tier 2 college. Was rejected from all Masters programs 2x. Rejected from every single job except Amazon test engineering. Rejected from DeepMind 3x. Nearly had his baby project shut down. Struggled with visa issues. After 12 years of failures (2005-17), he eventually rose to became a VP at Meta one of the most influential people in AI! Soumith's story is one of resilience and he's living proof that no matter how down in the dumps you are, there's always hope.

English

276

1.2K

11.1K

Himanshu Tyagi@hstyagi·10 Kas

If diffusion models drive all creative arts, we will learn that humans are not more creative than a kettle dissipating heat to boil water. A bit sad...

English

240

220

8.3K

Himanshu Tyagi@hstyagi·5 Kas

@abeirami It is a blessing and a burden! You keep on wishing that heuristics driven from beautiful beautiful geometric insights give the best algorithms :)

English

376

Ahmad Beirami@abeirami·5 Kas

Once you see a math concept geometrically, it becomes much easier to think about, and it’s hard to go back to any other way of seeing it.

English

352

17.2K

Himanshu Tyagi@hstyagi·22 Eki

ROMA is a very simple and versatile architecture that recursively breaks complex queries into simpler ones. This method of coordinating multiple agents/tools/models is apt for deep research, long horizon tasks and boosting the power of models. This is emerging as an important primitive for multiagent reasoning systems across industries. This new version of the repo is more builder friendly and comes with prompt optimizer capabilities of DSPy. You can build a lot of stuff on it!

Salah Alzu'bi@salahalzubi401

[1/8] 🧵 🚀 ROMA (Recursive Open Meta Agents) v0.2.0 is here! Many exciting features have been added to streamline research/production threads: for better reliability and a builder-friendly ecosystem for high-performance recursive multi-agent systems. Stay tuned for the upcoming paper with some exciting results!We've completely rebuilt our framework using @DSPyOSS In this thread: the motivation and technical details behind ROMA, exciting research directions we're exploring, and our vision for recursive agents going forward github.com/sentient-agi/R…

English

256

382

42.8K

Himanshu Tyagi retweetledi

Sentient@SentientAGI·15 Eki

We’re excited to announce that @NeurIPSConf—the biggest AI conference in the world—has accepted 4 of our papers across various categories. Some might even call it “full-stack excellence” 😁 Here’s a sneak peek at our work that’s been recognized for their breakthroughs: ➡️ OML 1.0 (Main Track): scalable LLM fingerprinting—a hundredfold improvement on legacy fingerprinting attempts for open models, injecting 24,576 persistent prints while the previous max was ~100 fingerprints…without any drop in model performance. ➡️ LiveCodeBenchPro (Data & Benchmark Track): our customized benchmark focusing on programming ability, illustrating the true capabilities of models’ coding performance. On this benchmark, we were able to create models 10x smaller, using 20% of the data, to achieve comparable results to competing models. ➡️ MindGames Arena (Competition Track): selected by NeurIPS to run an AI competition for agents to improve themselves through social games. The next paradigm of AI improvement comes through self-optimization, and we’re extremely excited to be hosting this first-of-its-kind competition to create self-improving AI. ➡️ OML (Workshops & Tutorials—Lock-LLMs): our work established the challenge and solution around model security: a primitive that lets builders develop open models with verifiable, cryptographically enforced control under white-box access. Stay tuned for deep-dive threads throughout the week!

English

948

302

1.7K

617.6K

Himanshu Tyagi@hstyagi·10 Eki

This is not what I meant by dog fooding

Oleg Golev@oleg_golev

@SentientAGI needs to have a hot dog eating competition, 5 hot dogs is weak, I'll outeat @sandeepnailwal with 20 😤

English

182

12.1K

Himanshu Tyagi@hstyagi·10 Eki

@SentientAGI @sandeepnailwal I thought you wanted to cook open source AGI. I see hot dogs.

English

2.1K

Sentient@SentientAGI·9 Eki

Meet @sandeepnailwal Sentient's Co-Founder and professional hot dog eater (allegedly) Drop a 🌭 if you want to see him prove he can eat 5 hot dogs in under a minute.