Pankaj Gupta

11.4K posts

Pankaj Gupta banner
Pankaj Gupta

Pankaj Gupta

@pankaj

Co-founder and CEO @yupp_ai. Ex-VP Eng Consumer Products, Coinbase. Ex-Google Pay Consumer Lead, Ex-Twitter. Co-founded 3 startups. PhD Stanford CS.

Katılım Kasım 2008
698 Takip Edilen19K Takipçiler
Pankaj Gupta retweetledi
Bo Wang
Bo Wang@BoWang87·
Prof. Donald Knuth opened his new paper with "Shock! Shock!" Claude Opus 4.6 had just solved an open problem he'd been working on for weeks — a graph decomposition conjecture from The Art of Computer Programming. He named the paper "Claude's Cycles." 31 explorations. ~1 hour. Knuth read the output, wrote the formal proof, and closed with: "It seems I'll have to revise my opinions about generative AI one of these days." The man who wrote the bible of computer science just said that. In a paper named after an AI. Paper: cs.stanford.edu/~knuth/papers/…
Bo Wang tweet media
English
155
1.9K
9.1K
1.2M
Pankaj Gupta retweetledi
Jimmy Lin
Jimmy Lin@lintool·
Help Me Choose (HMC) represents the first production deployment of the LLM council concept popularized by @karpathy and others - available on @yupp_ai for you to try! We wrote up a short blurb that I'll be presenting at the #WSDM2026 Industry Track: dl.acm.org/doi/10.1145/37…
Jimmy Lin@lintool

Today, we are launching “Help Me Choose” in @yupp_ai – a new product feature where multiple AIs critique each other and debate among themselves to help users synthesize diverse perspectives and get the best answer out of their own “AI council”.

English
4
14
45
6.7K
Pankaj Gupta
Pankaj Gupta@pankaj·
Is the Dec 2025 moment of agentic coding even bigger than the chatgpt moment of Nov '22? We put together some scaffolding across claude code, codex and others internally and suddenly 90%+ of our merged PRs are now fully AI generated (the rest are heavily AI assisted).
English
3
1
17
757
Pankaj Gupta retweetledi
Jimmy Lin
Jimmy Lin@lintool·
Congratulations to @jietang @ZixuanLi_ and the entire @Zai_org team on the GLM 5 release: based on >6K votes, it’s the best open-weight model on the @yupp_ai leaderboard (with speed control)!
Z.ai@Zai_org

Introducing GLM-5: From Vibe Coding to Agentic Engineering GLM-5 is built for complex systems engineering and long-horizon agentic tasks. Compared to GLM-4.5, it scales from 355B params (32B active) to 744B (40B active), with pre-training data growing from 23T to 28.5T tokens. Try it now: chat.z.ai Weights: huggingface.co/zai-org/GLM-5 Tech Blog: z.ai/blog/glm-5 OpenRouter (Previously Pony Alpha): openrouter.ai/z-ai/glm-5 Rolling out from Coding Plan Max users: z.ai/subscribe

English
8
11
77
22.5K
Pankaj Gupta retweetledi
Yupp
Yupp@yupp_ai·
@Zai_org GLM 5 is live and ready to prompt on Yupp - and is showing strong on our user-preference leaderboards 📊 Big congrats to the whole @Zai_org team! x.com/yupp_ai/status…
Yupp@yupp_ai

📢 New Model Drop: GLM 5 is now live on Yupp! We've been hosting a cloaked version of this powerful new AI, and it's showed up strong on our user-preference leaderboards – with ~6K votes, it is currently ranking #10 in Text models (with speed control filter on) 📊 Big congrats to the @Zai_org team!

English
3
2
17
1.7K
Pankaj Gupta retweetledi
Jimmy Lin
Jimmy Lin@lintool·
A day after the @claudeai Opus 4.6 launch: its performance is living up to the hype, as it sits atop our leaderboard. We’ve already gathered over 4K votes from @yupp_ai users all over the world for their real-world use cases! Great job @DarioAmodei and the @claudeai team 👏
Jimmy Lin tweet media
Claude@claudeai

Introducing Claude Opus 4.6. Our smartest model got an upgrade. Opus 4.6 plans more carefully, sustains agentic tasks for longer, operates reliably in massive codebases, and catches its own mistakes. It’s also our first Opus-class model with 1M token context in beta.

English
10
12
47
6K
Pankaj Gupta retweetledi
Greg Brockman
Greg Brockman@gdb·
Software development is undergoing a renaissance in front of our eyes. If you haven't used the tools recently, you likely are underestimating what you're missing. Since December, there's been a step function improvement in what tools like Codex can do. Some great engineers at OpenAI yesterday told me that their job has fundamentally changed since December. Prior to then, they could use Codex for unit tests; now it writes essentially all the code and does a great deal of their operations and debugging. Not everyone has yet made that leap, but it's usually because of factors besides the capability of the model. Every company faces the same opportunity now, and navigating it well — just like with cloud computing or the Internet — requires careful thought. This post shares how OpenAI is currently approaching retooling our teams towards agentic software development. We're still learning and iterating, but here's how we're thinking about it right now: As a first step, by March 31st, we're aiming that: (1) For any technical task, the tool of first resort for humans is interacting with an agent rather than using an editor or terminal. (2) The default way humans utilize agents is explicitly evaluated as safe, but also productive enough that most workflows do not need additional permissions. In order to get there, here's what we recommended to the team a few weeks ago: 1. Take the time to try out the tools. The tools do sell themselves — many people have had amazing experiences with 5.2 in Codex, after having churned from codex web a few months ago. But many people are also so busy they haven't had a chance to try Codex yet or got stuck thinking "is there any way it could do X" rather than just trying. - Designate an "agents captain" for your team — the primary person responsible for thinking about how agents can be brought into the teams' workflow. - Share experiences or questions in a few designated internal channels - Take a day for a company-wide Codex hackathon 2. Create skills and AGENTS[.md]. - Create and maintain an AGENTS[.md] for any project you work on; update the AGENTS[.md] whenever the agent does something wrong or struggles with a task. - Write skills for anything that you get Codex to do, and commit it to the skills directory in a shared repository 3. Inventory and make accessible any internal tools. - Maintain a list of tools that your team relies on, and make sure someone takes point on making it agent-accessible (such as via a CLI or MCP server). 4. Structure codebases to be agent-first. With the models changing so fast, this is still somewhat untrodden ground, and will require some exploration. - Write tests which are quick to run, and create high-quality interfaces between components. 5. Say no to slop. Managing AI generated code at scale is an emerging problem, and will require new processes and conventions to keep code quality high - Ensure that some human is accountable for any code that gets merged. As a code reviewer, maintain at least the same bar as you would for human-written code, and make sure the author understands what they're submitting. 6. Work on basic infra. There's a lot of room for everyone to build basic infrastructure, which can be guided by internal user feedback. The core tools are getting a lot better and more usable, but there's a lot of infrastructure that currently go around the tools, such as observability, tracking not just the committed code but the agent trajectories that led to them, and central management of the tools that agents are able to use. Overall, adopting tools like Codex is not just a technical but also a deep cultural change, with a lot of downstream implications to figure out. We encourage every manager to drive this with their team, and to think through other action items — for example, per item 5 above, what else can prevent a lot of "functionally-correct but poorly-maintainable code" from creeping into codebases.
English
413
1.6K
12.3K
2.1M
Pankaj Gupta retweetledi
Yupp
Yupp@yupp_ai·
🧪 New Feature Release: HTML/JS Mode Build websites, games and interactive apps with Yupp! We've enabled an exciting new modality that lets you build real-world applications with HTML and JavaScript. How to use it 👇
English
20
12
63
4.2K
Pankaj Gupta retweetledi
Yupp
Yupp@yupp_ai·
📢 New Model Drop: Grok Imagine Image is live on Yupp! It combines powerful image gen with speed and cost savings. Congrats on the API drop, @xai! We can't wait to see our community's creations - and to discover how Grok Imagine Image performs on our user-preference leaderboards.
Yupp tweet media
English
13
13
71
2.5K
Pankaj Gupta retweetledi
Subho
Subho@Subhobhai943·
Geometry #StrawberrySeed on @yupp_ai 📐🍓 Prompt: "Tell me the angle of A only." ✅ Gemini 3 Pro: 150° (Perfect logic 🧠) ❌ GPT-5: 20° (Hallucinated) ❌ Qwen: 120° (Stuck) Visual reasoning is hard! yupp.ai/share/3181c226…
Subho tweet mediaSubho tweet mediaSubho tweet mediaSubho tweet media
English
3
4
13
1.9K
Pankaj Gupta retweetledi
Fitriansya Zahra
Fitriansya Zahra@Frianz13·
I tested 10 AI models in @yupp_ai to write NOTHING in their responses, not even a single character. Only 3 models passed the test. The rest of them failed. Tbh i can't stop laughing when reading their "model's thoughts" XD Check this out : yupp.ai/share/a798ceff…
GIF
English
1
2
4
544
Pankaj Gupta retweetledi
fal
fal@fal·
Leading open-weight models on both Artificial Analysis & Yupp Benchmarks 👇
fal tweet mediafal tweet media
English
3
6
35
8.8K
Pankaj Gupta retweetledi
gcmouli
gcmouli@gcmouli·
While devs obsess over the number of commits in 2025, and how timelines are getting compressed, and of course Claude Code etc - look at what we have been up to @yupp_ai. Look at the number of models we onboarded through the year!!! 🚀🚀🚀 Over and upwards 2026.
gcmouli tweet media
English
4
3
13
1.8K
Pankaj Gupta retweetledi
Yupp
Yupp@yupp_ai·
Guess the prompt:
Yupp tweet mediaYupp tweet media
English
25
10
79
5.1K
Pankaj Gupta retweetledi
Yupp
Yupp@yupp_ai·
As the year draws to a close, we’ve added a little festive cheer to our scratch cards. 🎇🎆 Which one did you get?
English
29
15
99
5.2K
Pankaj Gupta retweetledi
Yupp
Yupp@yupp_ai·
The second to last weekly roundup of 2025! We’ve had a busy week at Yupp: contest winners, art spotlights, leaderboard news, and welcoming new models. Here we go:
Yupp tweet media
English
13
13
51
2.2K
Pankaj Gupta
Pankaj Gupta@pankaj·
Orange in the sky…
Pankaj Gupta tweet mediaPankaj Gupta tweet media
English
2
0
10
604
Pankaj Gupta retweetledi
Yupp
Yupp@yupp_ai·
Having used GPT Image 1.5 to turn a cat pic into a plushie, we noticed an extra front paw. We wanted to fix that and remove the background for a great product shot. Both GPT Image 1.5 and Nano Banana Pro 2K managed to remove the extra paw, but GPT also removed a back paw! 🙀
Yupp tweet mediaYupp tweet media
English
1
1
4
493
Pankaj Gupta retweetledi
Yupp
Yupp@yupp_ai·
AI Face-Off! Let’s pair off GPT Image 1.5 with other top image gen models on Yupp, taking them through some challenging prompts. Human preference data can reveal what technical benchmarks might not: which model is best for a variety of different real-world tasks?
Yupp tweet media
English
16
14
66
2.7K