Pankaj Gupta

11.4K posts

Pankaj Gupta

@pankaj

Co-founder and CEO @yupp_ai. Ex-VP Eng Consumer Products, Coinbase. Ex-Google Pay Consumer Lead, Ex-Twitter. Co-founded 3 startups. PhD Stanford CS.

Katılım Kasım 2008

698 Takip Edilen19K Takipçiler

Pankaj Gupta retweetledi

Bo Wang@BoWang87·3 Mar

Prof. Donald Knuth opened his new paper with "Shock! Shock!" Claude Opus 4.6 had just solved an open problem he'd been working on for weeks — a graph decomposition conjecture from The Art of Computer Programming. He named the paper "Claude's Cycles." 31 explorations. ~1 hour. Knuth read the output, wrote the formal proof, and closed with: "It seems I'll have to revise my opinions about generative AI one of these days." The man who wrote the bible of computer science just said that. In a paper named after an AI. Paper: cs.stanford.edu/~knuth/papers/…

English

155

1.9K

9.1K

1.2M

Pankaj Gupta retweetledi

Jimmy Lin@lintool·23 Şub

Help Me Choose (HMC) represents the first production deployment of the LLM council concept popularized by @karpathy and others - available on @yupp_ai for you to try! We wrote up a short blurb that I'll be presenting at the #WSDM2026 Industry Track: dl.acm.org/doi/10.1145/37…

Jimmy Lin@lintool

Today, we are launching “Help Me Choose” in @yupp_ai – a new product feature where multiple AIs critique each other and debate among themselves to help users synthesize diverse perspectives and get the best answer out of their own “AI council”.

English

6.7K

Pankaj Gupta@pankaj·14 Şub

Is the Dec 2025 moment of agentic coding even bigger than the chatgpt moment of Nov '22? We put together some scaffolding across claude code, codex and others internally and suddenly 90%+ of our merged PRs are now fully AI generated (the rest are heavily AI assisted).

English

757

Pankaj Gupta retweetledi

Jimmy Lin@lintool·11 Şub

Congratulations to @jietang @ZixuanLi_ and the entire @Zai_org team on the GLM 5 release: based on >6K votes, it’s the best open-weight model on the @yupp_ai leaderboard (with speed control)!

Z.ai@Zai_org

Introducing GLM-5: From Vibe Coding to Agentic Engineering GLM-5 is built for complex systems engineering and long-horizon agentic tasks. Compared to GLM-4.5, it scales from 355B params (32B active) to 744B (40B active), with pre-training data growing from 23T to 28.5T tokens. Try it now: chat.z.ai Weights: huggingface.co/zai-org/GLM-5 Tech Blog: z.ai/blog/glm-5 OpenRouter (Previously Pony Alpha): openrouter.ai/z-ai/glm-5 Rolling out from Coding Plan Max users: z.ai/subscribe

English

22.5K

Pankaj Gupta retweetledi

Yupp@yupp_ai·11 Şub

@Zai_org GLM 5 is live and ready to prompt on Yupp - and is showing strong on our user-preference leaderboards 📊 Big congrats to the whole @Zai_org team! x.com/yupp_ai/status…

Yupp@yupp_ai

📢 New Model Drop: GLM 5 is now live on Yupp! We've been hosting a cloaked version of this powerful new AI, and it's showed up strong on our user-preference leaderboards – with ~6K votes, it is currently ranking #10 in Text models (with speed control filter on) 📊 Big congrats to the @Zai_org team!

English

1.7K

Pankaj Gupta retweetledi

Guillermo Rauch@rauchg·9 Şub

x.com/i/article/2020…

ZXX

574

97.6K

Pankaj Gupta retweetledi

Jimmy Lin@lintool·6 Şub

A day after the @claudeai Opus 4.6 launch: its performance is living up to the hype, as it sits atop our leaderboard. We’ve already gathered over 4K votes from @yupp_ai users all over the world for their real-world use cases! Great job @DarioAmodei and the @claudeai team 👏

Claude@claudeai

Introducing Claude Opus 4.6. Our smartest model got an upgrade. Opus 4.6 plans more carefully, sustains agentic tasks for longer, operates reliably in massive codebases, and catches its own mistakes. It’s also our first Opus-class model with 1M token context in beta.

English

Pankaj Gupta retweetledi

Greg Brockman@gdb·6 Şub

Software development is undergoing a renaissance in front of our eyes. If you haven't used the tools recently, you likely are underestimating what you're missing. Since December, there's been a step function improvement in what tools like Codex can do. Some great engineers at OpenAI yesterday told me that their job has fundamentally changed since December. Prior to then, they could use Codex for unit tests; now it writes essentially all the code and does a great deal of their operations and debugging. Not everyone has yet made that leap, but it's usually because of factors besides the capability of the model. Every company faces the same opportunity now, and navigating it well — just like with cloud computing or the Internet — requires careful thought. This post shares how OpenAI is currently approaching retooling our teams towards agentic software development. We're still learning and iterating, but here's how we're thinking about it right now: As a first step, by March 31st, we're aiming that: (1) For any technical task, the tool of first resort for humans is interacting with an agent rather than using an editor or terminal. (2) The default way humans utilize agents is explicitly evaluated as safe, but also productive enough that most workflows do not need additional permissions. In order to get there, here's what we recommended to the team a few weeks ago: 1. Take the time to try out the tools. The tools do sell themselves — many people have had amazing experiences with 5.2 in Codex, after having churned from codex web a few months ago. But many people are also so busy they haven't had a chance to try Codex yet or got stuck thinking "is there any way it could do X" rather than just trying. - Designate an "agents captain" for your team — the primary person responsible for thinking about how agents can be brought into the teams' workflow. - Share experiences or questions in a few designated internal channels - Take a day for a company-wide Codex hackathon 2. Create skills and AGENTS[.md]. - Create and maintain an AGENTS[.md] for any project you work on; update the AGENTS[.md] whenever the agent does something wrong or struggles with a task. - Write skills for anything that you get Codex to do, and commit it to the skills directory in a shared repository 3. Inventory and make accessible any internal tools. - Maintain a list of tools that your team relies on, and make sure someone takes point on making it agent-accessible (such as via a CLI or MCP server). 4. Structure codebases to be agent-first. With the models changing so fast, this is still somewhat untrodden ground, and will require some exploration. - Write tests which are quick to run, and create high-quality interfaces between components. 5. Say no to slop. Managing AI generated code at scale is an emerging problem, and will require new processes and conventions to keep code quality high - Ensure that some human is accountable for any code that gets merged. As a code reviewer, maintain at least the same bar as you would for human-written code, and make sure the author understands what they're submitting. 6. Work on basic infra. There's a lot of room for everyone to build basic infrastructure, which can be guided by internal user feedback. The core tools are getting a lot better and more usable, but there's a lot of infrastructure that currently go around the tools, such as observability, tracking not just the committed code but the agent trajectories that led to them, and central management of the tools that agents are able to use. Overall, adopting tools like Codex is not just a technical but also a deep cultural change, with a lot of downstream implications to figure out. We encourage every manager to drive this with their team, and to think through other action items — for example, per item 5 above, what else can prevent a lot of "functionally-correct but poorly-maintainable code" from creeping into codebases.

English

413

1.6K

12.3K

2.1M

Pankaj Gupta retweetledi

Yupp@yupp_ai·3 Şub

🧪 New Feature Release: HTML/JS Mode Build websites, games and interactive apps with Yupp! We've enabled an exciting new modality that lets you build real-world applications with HTML and JavaScript. How to use it 👇

English

4.2K

Pankaj Gupta retweetledi

Yupp@yupp_ai·29 Oca

📢 New Model Drop: Grok Imagine Image is live on Yupp! It combines powerful image gen with speed and cost savings. Congrats on the API drop, @xai! We can't wait to see our community's creations - and to discover how Grok Imagine Image performs on our user-preference leaderboards.

English

2.5K

Pankaj Gupta retweetledi

Subho@Subhobhai943·21 Oca

Geometry #StrawberrySeed on @yupp_ai 📐🍓 Prompt: "Tell me the angle of A only." ✅ Gemini 3 Pro: 150° (Perfect logic 🧠) ❌ GPT-5: 20° (Hallucinated) ❌ Qwen: 120° (Stuck) Visual reasoning is hard! yupp.ai/share/3181c226…

English

1.9K

Pankaj Gupta retweetledi

Fitriansya Zahra@Frianz13·15 Oca

I tested 10 AI models in @yupp_ai to write NOTHING in their responses, not even a single character. Only 3 models passed the test. The rest of them failed. Tbh i can't stop laughing when reading their "model's thoughts" XD Check this out : yupp.ai/share/a798ceff…

GIF

English

544

Pankaj Gupta retweetledi

fal@fal·29 Ara

Leading open-weight models on both Artificial Analysis & Yupp Benchmarks 👇

English

8.8K

Pankaj Gupta retweetledi

gcmouli@gcmouli·29 Ara

While devs obsess over the number of commits in 2025, and how timelines are getting compressed, and of course Claude Code etc - look at what we have been up to @yupp_ai. Look at the number of models we onboarded through the year!!! 🚀🚀🚀 Over and upwards 2026.

English

1.8K

Pankaj Gupta retweetledi

Yupp@yupp_ai·27 Ara

Guess the prompt:

English

5.1K

Pankaj Gupta retweetledi

Yupp@yupp_ai·25 Ara

As the year draws to a close, we’ve added a little festive cheer to our scratch cards. 🎇🎆 Which one did you get?

English

5.2K

Pankaj Gupta retweetledi

Yupp@yupp_ai·23 Ara

The second to last weekly roundup of 2025! We’ve had a busy week at Yupp: contest winners, art spotlights, leaderboard news, and welcoming new models. Here we go:

English

2.2K

Pankaj Gupta@pankaj·23 Ara

Orange in the sky…

English

604

Pankaj Gupta retweetledi

Yupp@yupp_ai·20 Ara

Having used GPT Image 1.5 to turn a cat pic into a plushie, we noticed an extra front paw. We wanted to fix that and remove the background for a great product shot. Both GPT Image 1.5 and Nano Banana Pro 2K managed to remove the extra paw, but GPT also removed a back paw! 🙀

English

493

Pankaj Gupta retweetledi

Yupp@yupp_ai·20 Ara

AI Face-Off! Let’s pair off GPT Image 1.5 with other top image gen models on Yupp, taking them through some challenging prompts. Human preference data can reveal what technical benchmarks might not: which model is best for a variety of different real-world tasks?

English

2.7K

Keşfet

@karpathy @yupp_ai @jietang @ZixuanLi_ @Zai_org @claudeai @DarioAmodei @xai