Vikram Chatterji

101 posts

Vikram Chatterji

@vikramchatterji

Co-founder @rungalileo GenAI Development Platform // Prev Product @GoogleAI

San Francisco, CA Katılım Ekim 2009

230 Takip Edilen250 Takipçiler

Sabitlenmiş Tweet

Vikram Chatterji@vikramchatterji·4 May

Excited to finally talk about what we have been working on at @rungalileo for the past year! Building the data intelligence toolchain for ML developers working with unstructured data. Launch announcement: t.ly/wm9_ More here: t.ly/Alh2

English

Vikram Chatterji retweetledi

Galileo@rungalileo·26 Haz

Scaling agentic systems means preparing for hundreds or even thousands of agents operating in production. But achieving this level of scale brings critical new questions: – How will you provision and orchestrate these agents? – What about authentication and authorization? – How will you evaluate, measure, and ensure reliability? @crewAIInc CEO and Co-founder @joaomdmoura joined us on the Chain of Thought podcast to discuss the emerging agentic stack, and why the agentic future will require an entire ecosystem, from your databases to your user interfaces. We’re proud to partner with industry leaders like CrewAI, learn more in this week’s episode with João, @vikramchatterji, and @ConorBronsdon 👇

English

267

Vikram Chatterji retweetledi

AMD@AMD·21 Haz

ICYMI: Recorded live at #AdvancingAI, check out this special episode of Chain of Thought podcast hosted by @ConorBronsdon with @AnushElangovan and @realSharonZhou from @AIatAMD

Galileo@rungalileo

How is an open ecosystem powering the next generation of AI for developers? Recording live from the heart of the action at @AMD's Advancing AI 2025, Chain of Thought host @ConorBronsdon welcomes AMD’s @AnushElangovan, VP of AI Software, and @realSharonZhou, VP of AI. Together they unpack AMD's groundbreaking transformation from a hardware giant to a leader in full-stack AI, committed to an open ecosystem. Discover how new MI350 GPUs deliver mind-blowing performance with advanced data types and why ROCm 7 and AMD Developer Cloud offer Day Zero support for frontier models. This relentless pace of hardware and software innovation is reshaping the AI landscape. Then Conor welcomes Sharon Zhou, VP of AI at AMD, to discuss making AMD's powerful software stack truly accessible and how to drive developer curiosity. Sharon explains strategies for creating a "happy path" for community contributions, fostering engagement through teaching, and listening to developers at every stage. She shares her predictions for the future, including the rise of self-improving AI, the critical role of heterogeneous compute, and the potential of "vibes based feedback" to guide models. This vision for democratizing access to high-performance AI, driven by a deep understanding of the developer journey, promises to unlock the next generation of applications. 00:00 Live from AMD's Advancing AI 2025 Event 00:30 Introduction to Anush Elangovan 01:38 The MI350 GPU Series Unveiled 04:57 CDNA4 Architecture Explained 07:00 The Future of AI Infrastructure 08:32 AMD's Developer Cloud and ROCm 7 11:50 Cultural Shift at AMD 14:48 Open Source and Community Contributions 18:35 Software Longevity and Ecosystem Strategy 22:19 AI Agents and Performance Gains 27:36 AI's Role in Solving Power Challenges 28:11 Thanking Anush 28:42 Introduction to Sharon Zhou 29:45 Sharon's Focus at AMD 30:39 Engaging Developers with AMD's AI Tools 31:24 Listening to the AI Community 33:56 Open Source and AI Development 45:04 Future of AI and Self-Improving Models 48:04 Final Thoughts and Farewell

English

26.6K

Vikram Chatterji retweetledi

Galileo@rungalileo·11 Haz

Debugging agents shouldn’t feel like detective work. Today, we’re excited to release two new AI agent interfaces that make agent observability & evaluations even more effective. 🔎 Timeline View – See execution flow and bottlenecks at a glance. No more guessing where your agent gets stuck. 💬 Conversation View – Experience exactly what your users see. Debug from the user's perspective, not just the system's. Combined with last week's Graph View, you now have three complementary ways to debug your agents: → Graph: Visualize decision paths and tool usage → Timeline: Spot performance bottlenecks instantly → Conversation: See the user experience end-to-end AI evaluations + observability are crucial to building reliable AI. These interfaces make it simpler to identify blockers and improve your agents faster. See all three views in action, and try it free with the link below 👇

English

156

383

499.5K

Vikram Chatterji retweetledi

Galileo@rungalileo·28 Şub

🔥 𝗝𝗨𝗦𝗧 𝗥𝗘𝗟𝗘𝗔𝗦𝗘𝗗: 𝗢𝘂𝗿 𝗟𝗮𝘁𝗲𝘀𝘁 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁 𝗟𝗲𝗮𝗱𝗲𝗿𝗯𝗼𝗮𝗿𝗱 𝗦𝗵𝗼𝘄𝘀 𝗦𝘂𝗿𝗽𝗿𝗶𝘀𝗶𝗻𝗴 𝗥𝗲𝘀𝘂𝗹𝘁𝘀 We've just updated our AI Agent Leaderboard at Galileo, and the performance rankings challenge conventional wisdom about which models deliver the best value for AI agents. The headline finding: Gemini-2.0-flash-lite dominates with a 0.933 performance score, outperforming GPT-4.5 at a fraction of the cost. Three critical insights from our comprehensive evaluation: • 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲-𝘁𝗼-𝗖𝗼𝘀𝘁 𝗥𝗮𝘁𝗶𝗼: The top 3 models and GPT-4.5 span a staggering 1000x price difference while showing only a 2% performance gap. This raises important questions about cost efficiency in production AI agents. • 𝗢𝗽𝗲𝗻 𝗦𝗼𝘂𝗿𝗰𝗲 𝗣𝗿𝗼𝗴𝗿𝗲𝘀𝘀: Mistral-small-2501 leads the open source category at 0.83, performing on par with GPT-4o-mini. This signals the growing maturity of open source models for tool-calling capabilities. • 𝗠𝗼𝗱𝗲𝗹 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗛𝗶𝗲𝗿𝗮𝗿𝗰𝗵𝘆: Claude-3.7-sonnet (0.953) > Gemini-2.0-flash (0.938) > GPT-4.5 preview (0.900) demonstrates a clear performance ranking across the major AI providers. Our evaluation covered 20 models across 14 diverse datasets, assessing real-world AI agent capabilities and tool selection quality. 𝚆̲𝚑̲𝚊̲𝚝̲'̲𝚜̲ ̲𝙽̲𝚎̲𝚡̲𝚝̲?̲ We're raising the bar. Our upcoming evaluations will incorporate more challenging metrics focused on real-world scenarios with additional complex and specific datasets. As AI agents grow more sophisticated, the foundation models powering them must improve in decision quality, goal alignment, and task completion—all while maintaining reasonable costs for builders. What other metrics or test cases would you like to see in our next evaluation? Check out the full updated leaderboard and methodology below 👇

English

186

Vikram Chatterji retweetledi

Galileo@rungalileo·25 Şub

𝗕𝗿𝗲𝗮𝗸𝗶𝗻𝗴: Claude 3.7 Sonnet claims the top spot on our AI Agent Leaderboard! Our comprehensive evaluation shows @AnthropicAI's newest model achieving a 0.953 TSQ score, narrowly edging out Gemini 2.0-flash (0.938) and GPT-4o (0.900). Looking at the performance-to-cost ratio reveals an interesting story: While Claude 3.7 delivers exceptional performance, Gemini 2.0-flash still offers remarkable value at just $0.15/$0.60 per million tokens—20x cheaper than some competitors with comparable capabilities. As @OfficialLoganK + @ConorBronsdon discussed in our recent podcast, tool calling capabilities continue to evolve rapidly. The competitive landscape shows how quickly models are advancing in their ability to accurately select tools, orchestrate multi-step processes, and handle edge cases. Check out our updated leaderboard to see how your preferred model stacks up 👇

English

333

Vikram Chatterji@vikramchatterji·22 Şub

With the cost of intelligence and compute going down fast, the 'quality' of your AI product's is the moat. A well thought through layer of offline and online evals make the difference between otherwise commoditized product experiences across AI verticals.

Garry Tan@garrytan

Evals are emerging as the real moat for AI startups Hard won insights about customers and their business logic discovered by founders acting almost as ethnographers spelunking in the underserved slices of the GDP pie chart

English

Vikram Chatterji retweetledi

Alex Rudall@alexrudall·18 Oca

Chain of Thought from @rungalileo is the best podcast I’ve found so far that’s focused on productionising genAI. Worth a listen

English

582

Vikram Chatterji retweetledi

Bob van Luijt@bobvanluijt·28 Eki

🤩 Excited to join the @rungalileo virtual panel tomorrow! 💡 Join us as we dive into Agentic Architectures, Generative Feedback Loops, and more 🗣️ with: @_brian_raymond , @joaomdmoura, @vikramchatterji, and yours truly 📅 Details: October 29th, Virtual, Free 🎟️ Register here: galileo.ai/genai-producti…

English

1.7K

Vikram Chatterji retweetledi

Unstructured@UnstructuredIO·28 Eki

Excited for @rungalileo GenAI Productionize - happening tomorrow! Sign up below to watch @_Brian_Raymond and other leaders chat on the latest in AI agents and GenAI. 📅 Details: October 29th, Virtual, Free 🎟️ Register here: galileo.ai/genai-producti…

English

557

Vikram Chatterji retweetledi

Forbes@Forbes·15 Eki

This AI Startup Raises $45 Million To Make Sure AI Models Don’t Hallucinate Or Leak Data trib.al/d2SzxyS trib.al/d2SzxyS

English

19.9K

Vikram Chatterji retweetledi

Stack Overflow@StackOverflow·22 May

🎙️We chat with @vikramchatterji, founder and CEO of @rungalileo, about the challenges of evaluating GenAI models, the importance of data quality in AI systems, and the trade-offs between using pre-trained models and fine-tuning models with custom data. stackoverflow.blog/2024/05/21/how…

English

Vikram Chatterji@vikramchatterji·6 Mar

@surojit Congrats @surojit !!

English

surojit@surojit·5 Mar

Introducing Ema 🚀 I'm thrilled to finally unveil Ema, Enterprise Machine Assistant, as we emerge from stealth mode! As the founder, this moment is incredibly special to me, marking the culmination of countless hours of hard work, innovation, and dedication from our amazing team. Ema isn't just another AI tool; it's a game-changer in the world of work. Our Universal AI Employee is poised to revolutionize how teams collaborate, communicate, and excel in today's dynamic landscape. Using our patent-pending technology, Ema boosts productivity across every role in the enterprise by automating any complex workflow, standard or specialized, with a simple conversation. She is simple to use, trusted and highly accurate. The early traction we've gained with enterprise customers like Envoy Global, Moneyview, TrueLayer and many others, speaks volumes about the value Ema brings to the table. It's humbling to see our vision resonate with businesses seeking to embrace the future of work. But this is just the beginning. We're driven by a relentless passion to push boundaries, challenge norms, and unlock new possibilities. The journey ahead is bound to be exhilarating, and we invite you all to join us on this exciting ride. Thank you to our customers, colleagues, partners, investors and mentors for your support over the past year. Together, we're defining the future of work—one persona at a time! 🚀✨ #EmaLaunch #futureofwork #UniversalAIEmployee #Innovation #GenerativeAI Find out why you should hire Ema, your Universal AI Employee. Follow our journey at @Ema_Unlimited. ema.co/resources/blog… youtu.be/QEhrXykNlnE

YouTube

English

106

187

57.9K

Vikram Chatterji@vikramchatterji·14 Oca

@ashah0052 @TheTuringPost @ashah0052 here's more on the methodology behind the index: rungalileo.io/hallucinationi… Tl;dr The ranking is based on a combo of the ChainPoll method (desc in the link above), and human reviewers as an additional check, across multiple popular datasets.

English

Ankit Shah@ankits0052·14 Oca

@TheTuringPost how does one compute the index at their end?

English

136

Ksenia_TuringPost@TheTuringPost·10 Oca

LLM Hallucination Index is the first open-source leaderboard to rank popular LLMs based on their hallucination score. ▪️ 3 task types: question & answer w/ and w/o RAG, and long-form text generation ▪️ Evaluated using 7 popular datasets Here is how you can use it:

English

3.5K

Vikram Chatterji retweetledi

agentnative@agentnativedev·24 Ara

LLM Hallucination Index is practitioner focused, intuitive and straight to the point. “Open AI's GPT-4-0613 performed the best and was least likely to hallucinate for Question & Answer with RAG. Huggingface's Zephyr-7b was the best-performing open-source model, outperforming Meta's 10x larger Llama-2-70b, proving larger models are not always better.” It will take sometime to build trust in enterprise deployments, but we are getting there! Great work from @rungalileo! rungalileo.io/hallucinationi…

English

1.2K

Vikram Chatterji retweetledi

Galileo@rungalileo·17 Kas

1/ This week, we launched the Hallucination Index, which ranks popular LLMs on their propensity to hallucinate for common GenAI tasks. We evaluated 11 LLMs across 3 GenAI tasks using 2 powerful metrics. Here’s what we found👇 #AI #LLM #Hallucinations #HallucinationIndex

GIF

English

722

Vikram Chatterji@vikramchatterji·15 Kas

Excited to unveil the Hallucination Index! Dive into the rankings and the methodology: rungalileo.io/hallucinationi… The goal is purely to help builders work with the right LLM for their nuanced tasks.

Galileo@rungalileo

🚀 Unveiling the Hallucination Index! 🚀 Evaluating LLM output quality across real-world tasks, it addresses the challenge of hallucinations with a structured framework. Learn more: rungalileo.io/hallucinationi… 🌐✨ #AI #Hallucinations #GenerativeAI

English

117

Vikram Chatterji retweetledi

Battery Ventures@BatteryVentures·15 Ağu

What we're reading today: @fortunemagazine's profile of Battery portfolio company @rungalileo and its co-founders @YashSheth46, @atinsanyal and @vikramchatterji! Check out the full story from @polina_marinova here: fortune.com/2023/08/10/yas…

English

656

Vikram Chatterji@vikramchatterji·25 Haz

@karpathy Super critical to build custom LMs with the *right*, *high-quality* data that is *contextual* to the use case -- rungalileo.io/llm-studio is a LLM prompt and diagnostics tool that aims to turbocharge exactly that!

English

101

Andrej Karpathy@karpathy·21 Haz

"Textbooks Are All You Need" is making rounds: x.com/sebastienbubec… reminding me of my earlier tweet :). TinyStories is also an inspiring read: x.com/eldanronen/sta… We'll probably see a lot more creative "scaling down" work: prioritizing data quality and diversity over quantity, a lot more synthetic data generation, and small but highly capable expert models.

Andrej Karpathy@karpathy

Seems likely we’ll have custom (and partially auto-generated) “textbooks” but for teaching language models, not humans, to help them “grok” concepts.

English

174

1.3K

442.9K

Vikram Chatterji@vikramchatterji·1 Nis

Thousands of people at the exploratorium for a ML meetup? This is what a new wave in technology looks like. Such great ideas and energy at the @huggingface open source AI event today! #WoodstockAI

San Francisco, CA 🇺🇸 English

1.3K

Vikram Chatterji retweetledi

Pete Warden@petewarden·3 Kas

I'm excited that @rungalileo are finally public! They've built a wonderful tool for automatically cleaning up NLP training data, it's an easy way to boost your model accuracy.

English

Keşfet

@crewAIInc @joaomdmoura @ConorBronsdon @AnushElangovan @realSharonZhou @AIatAMD @AnthropicAI @OfficialLoganK