Pankaj Gupta

601 posts

Pankaj Gupta

@defpan

Co-founder @basetenco working on ML model performance

Bay Area 가입일 Eylül 2011

899 팔로잉326 팔로워

고정된 트윗

Pankaj Gupta@defpan·9 Oca

Driving model performance optimization: 2024 highlights baseten.co/blog/driving-m… via @baseten

English

1.8K

Pankaj Gupta@defpan·19 Mar

@rapprach Absolutely thrilled about this milestone. This is a true turning point for cold starts. What’s coming next is going to be truly mind blowing. Watch this space!

English

Rachel Rapp@rapprach·19 Mar

x.com/i/article/2034…

ZXX

8.7K

Pankaj Gupta 리트윗함

AT@AliesTaha·7 Mar

- 230 training runs - 1,623 GPU hours (67 B200 days) - 76 TB of training data - a 2x faster model Every paper said it can't be done. Quantization Aware Distillation made it possible.

AT@AliesTaha

x.com/i/article/2029…

English

106

1.2K

145K

Pankaj Gupta@defpan·4 Mar

Had such a blast!

Baseten@baseten

Earlier this month, we hosted our biannual company-wide offsite and gathered 180 teammates in Austin, TX. Highlights included: > talent show > a chat with @saranormous about the evolution of the inference market > fireside chat with @EvidenceOpen > hackathon > a Texas ranch experience Within the last year, Baseten has moved faster than ever before. With 4X team growth, 12X revenue growth, and 3 separate fundraises, it's hard to believe how far we've come. At that pace, alignment doesn’t just happen. Our offsites enable us to celebrate wins, strengthen relationships across teams, and align on the next few months. And we're just getting started. If this sounds exciting to you, join us! baseten.co/careers

English

824

Pankaj Gupta 리트윗함

Baseten@baseten·2 Mar

We painted San Francisco green and pink, and the message is clear — you need to own your inference. If you spot us around the city, share a picture with us. We’ll send you something!

GIF

English

3.1K

Pankaj Gupta 리트윗함

NVIDIA AI Developer@NVIDIAAIDev·28 Şub

Nice drop from @philipkiely and @baseten. 📗 Inference Engineering maps the stack behind modern AI inference — runtimes, infrastructure, and tooling — and digs into the practical details of serving LLMs on NVIDIA GPUs with TensorRT LLM and Dynamo. ICYMI — worth the read. 👇

Philip Kiely@philipkiely

Inference Engineering launches today. baseten.com/inference-engi…

English

120

10.4K

Pankaj Gupta 리트윗함

World Labs@theworldlabs·25 Şub

We’re building foundational world models to power the next era of 3D. From robotics to gaming, spatial intelligence unlocks entirely new worlds. Powered by inference at scale – shoutout to Baseten.

English

210

19K

Pankaj Gupta 리트윗함

Jeff Huber@jeffreyhuber·21 Şub

the bar has been raised for book printing thanks @philipkiely for the copy!

English

679

29.3K

Pankaj Gupta@defpan·23 Şub

Inference is hard to learn because there are so many moving pieces. Now, you can see the whole stack in one place

English

Pankaj Gupta 리트윗함

Philip Kiely@philipkiely·23 Şub

Inference Engineering launches today. baseten.com/inference-engi…

English

187

216

2.2K

1.3M

Pankaj Gupta 리트윗함

Baseten@baseten·19 Şub

Generational AI companies are powered by Baseten. Why? We obsess over the milliseconds, so they can ship the future. Focus on what actually differentiates you. Leave the inference to us.

English

3.9K

Pankaj Gupta 리트윗함

AT@AliesTaha·19 Şub

we quantized the best open-source diffusion model on the market 4 bits huge speedup (almost) no quality loss this is a full explanation of the trillion dollar industry's oldest trick

English

Pankaj Gupta 리트윗함

Baseten@baseten·10 Şub

Introducing Kimi K2.5 on Baseten’s Model APIs with the most performant TTFT (0.26 sec) and TPS (340) on Artificial Analysis. Even among a landscape of incredible open source models, Kimi K2.5 stands out with its multi-modal capabilities and it's ability to accommodate an alarmingly large number of tool calls. Get the good stuff here: baseten.co/library/kimi-k…

English

15.1K

Pankaj Gupta@defpan·27 Oca

RT @tuhinone: The biggest hurdle to widespread AI adoption isn't just model capability, it's the cost and speed of inference. At Baseten, o…

English

Pankaj Gupta 리트윗함

Baseten@baseten·27 Oca

We boosted acceptance rate by up to 40% with the Baseten Speculation Engine. How? By combining Multi-Token Prediction (MTP) with Suffix Automaton (SA) decoding. This hybrid approach crushes production coding workloads, delivering 30%+ longer acceptance lengths on code editing tasks with zero added overhead. An open source version for TensorRT-LLM is now available to the community. Read the full engineering deep dive: baseten.co/blog/boosting-…

English

13.8K

Pankaj Gupta 리트윗함

Baseten@baseten·24 Oca

"the best application layer companies set up the harness and how to use it for the problem that your user is trying to solve"

English

Pankaj Gupta 리트윗함

Apoorv Agrawal@apoorv03·23 Oca

x.com/i/article/2014…

ZXX

119

41.8K

Pankaj Gupta 리트윗함

Tuhin Srivastava@tuhinone·23 Oca

Baseten’s day 0 bet was that inference was the technology that would enable the best user experiences AI could deliver–fast, smart, reliable, secure. And that those experiences would rely not only on a handful of giant general intelligence models, but millions of specialized models built by companies for their specific customers and use cases. Whether you’re a doctor, developer, lawyer, mechanic, researcher, construction worker, marketer, etc, you’re accelerated by specialized tools worthy of your craft. To me, this is one of the most meaningful promises AI can deliver on. We’re starting to see it now. Many of the main-character AI companies on the application layer are built on highly-specialized models for highly-specialized workflows–Abridge, Clay, Cursor, OpenEvidence, Hebbia, Mercor, Notion–these businesses are booming because customers love specialized tools. There are probably hundreds of custom models in production today. Soon, there will be thousands and then millions. All enabled by a high-performing inference layer. Inference has emerged as one of the hardest problems in modern AI systems. Delivering reliable, low-latency experiences requires deep coordination across distributed infrastructure, kernel-level performance, and software ergonomics—even world-class teams struggle to do this well. As a result, as consumers and developers, we’ve grown to accept sluggish performance, frequent downtime, and inconsistent quality across both application companies and model providers. Meanwhile, the demands on inference are accelerating: AI adoption is trending towards ubiquity with reasoning models that are orders of magnitude more compute-intensive. This will only increase as more companies catch on to the virtues of owning their end-to-end IP rather than relying on black-box model APIs on shared infrastructure. Whether we can realize the impact of this generational shift will depend on our ability to serve these models reliably at scale. We knew we could make the technology work, but the biggest delight of it all has been seeing what our customers do with it. The (many-model) future is bright.

Baseten@baseten

We’re thrilled to announce that we have raised $300M at a $5B valuation. The round is led by IVP and CapitalG, both doubling down on their investment in Baseten, and joined by 01A, Altimeter, Battery Ventures, BOND, BoxGroup, Blackbird Ventures, Conviction, Greylock, and NVIDIA. Read more here: baseten.co/blog/announcin…

English

222

78.2K

Pankaj Gupta 리트윗함

Baseten@baseten·23 Oca

English

326

282.3K

Pankaj Gupta@defpan·23 Oca

@baseten Incredibly proud of the team for turning hard optimization problems into real-world wins.

English

Pankaj Gupta 리트윗함

Baseten@baseten·22 Oca

Tired of waiting for video generation? Say less. We've optimized the Wan 2.2 runtime to hit: 3x faster inference on NVIDIA Blackwell, 2.5x faster on Hopper, 67% cost reduction. Read the full breakdown of our kernel optimizations and benchmarks here: #benchmarking-methodology" target="_blank" rel="nofollow noopener">baseten.co/blog/wan-2-2-v…

English

1.9K

탐색

@rapprach @philipkiely @baseten @tuhinone @elonmusk @BarackObama @taylorswift13 @cristiano