Shantanu Sharma

87 posts

Shantanu Sharma banner
Shantanu Sharma

Shantanu Sharma

@shantanu

he/him RT≠👍 NMLS # 1677482

New York, USA Katılım Eylül 2008
136 Takip Edilen673 Takipçiler
Shantanu Sharma retweetledi
USA Hockey
USA Hockey@usahockey·
RED, WHITE, BLUE AND GOLD. TEAM USA IS BRINGING HOME THE #WINTEROLYMPICS GOLD. 🇺🇸🦅
USA Hockey tweet media
English
2.1K
16.5K
63.2K
5.2M
Shantanu Sharma
Shantanu Sharma@shantanu·
Resilience is the word of the day! 🏠 The latest @MBAMortgage Chart of the Week shows the overall homeownership rate holding steady at 65.3%. While the 35–44 demographic navigates a shifting market, this stability creates a solid foundation for new solutions and opportunities in the year ahead. 📈 Insights from Dr. Eddie Seiler here: mba.org/news-and-resea… #HousingMarket #MortgageBankers #Homeownership #Economy #MBA #HousingMarket #Growth
English
1
0
1
76
Shantanu Sharma
Shantanu Sharma@shantanu·
The Huajiang Grand Canyon Bridge opened to traffic for the first time on Sunday. The bridge will showcase engineering capabilities and boost the goal of becoming a world-class tourist destination. nypost.com/2025/10/01/wor…
English
2
2
5
161
Shantanu Sharma
Shantanu Sharma@shantanu·
Exciting breakthroughs with Biomni-R0: enabling agentic automation in biomedical research tasks with higher accuracy. Using reinforcement learning to hill-climb biomedical reasoning agents to expert-level. Technical report: biomni.stanford.edu/blog/biomni-r0… Looking forward to collaborating with the OSS code and to enhance life science agentic models' ability to transfer learned reasoning skills to entirely new biomedical problems, towards a general biomedical reasoning agent. Thanks @RyanLi0802 @KexinHuang5 @ProjectBiomni @shiyi_c98 @NovaSkyAI @jure #BiomniR0 #BiomedicalAI #ReinforcementLearning #OpenSource #AIAgent #AIforScience #Biotech #Stanford #UCBerkeley
Jure Leskovec@jure

Reinforcement learning leads to better AI scientist agents! 🚀 By training models end-to-end with multi-turn RL, we’re seeing breakthroughs in reasoning and problem-solving for real biomedical research. Excited to introduce Biomni-R0 — an agentic LLM trained with this approach. On 10 real research tasks, it nearly doubles performance over its open-source base model and even surpasses closed-source frontier models by >10%. A scalable path to expert-level AI in biomedicine. Led by @RyanLi0802 @KexinHuang5 @ProjectBiomni with exciting collaboration with the SkyRL team @shiyi_c98 @NovaSkyAI. Learn more: biomni.stanford.edu/blog/biomni-r0… — open sourcing soon!

English
0
0
4
294
Shantanu Sharma retweetledi
Greg Abbott
Greg Abbott@GregAbbott_TX·
Declared this Sunday, July 6th, as a Day of Prayer in Texas in response to the floods in the Hill Country. I invite Texans to join me in prayer for the communities affected by this disaster.
Greg Abbott tweet media
English
2K
3.4K
20K
1.4M
Shantanu Sharma retweetledi
Brad Gerstner
Brad Gerstner@altcap·
The INVEST AMERICA ACT. Passed July 4, 2025. Accounts established & funded July 4, 2026. Because every child deserves to share in the upside of America. Happy birthday America! Never perfect. Always rising.
Brad Gerstner tweet media
English
396
269
3.1K
2.1M
Shantanu Sharma retweetledi
Eric Topol
Eric Topol@EricTopol·
A new cover for SUPER AGERS after making the NYT bestseller list. Thanks to you for making it the #1 ranked new non-fiction book on Amazon. amazon.com/gp/new-release…
Eric Topol tweet media
English
3
248
1.9K
1.4M
Shantanu Sharma
Shantanu Sharma@shantanu·
Good read: The Leaderboard Illusion: alphaxiv.org/abs/2504.20879 Big Tech commercially dependent on marketing model performance for revenues putting their best models out on Chatbot Arena is not surprising. I would argue against prohibiting score retraction after submission and instead encourage proprietary, open-weights, and open-source models to also do testing of as many model variants as resource permit and publish all model benchmarks. Agree on a non-repudiation audit trail for all models published on Chatbot Arena. Also important to clearly and conspicuously disclose that real world model performance / model mileage will vary, as Chatbot Arena is not how most real world model applications are utilized.
Andrej Karpathy@karpathy

There's a new paper circulating looking in detail at LMArena leaderboard: "The Leaderboard Illusion" arxiv.org/abs/2504.20879 I first became a bit suspicious when at one point a while back, a Gemini model scored #1 way above the second best, but when I tried to switch for a few days it was worse than what I was used to. Conversely as an example, around the same time Claude 3.5 was a top tier model in my personal use but it ranked very low on the arena. I heard similar sentiments both online and in person. And there were a number of other relatively random models, often suspiciously small, with little to no real-world knowledge as far as I know, yet they ranked quite high too. "When the data and the anecdotes disagree, the anecdotes are usually right." (Jeff Bezos on a recent pod, though I share the same experience personally). I think these teams have placed different amount of internal focus and decision making around LM Arena scores specifically. And unfortunately they are not getting better models overall but better LM Arena models, whatever that is. Possibly something with a lot of nested lists, bullet points and emoji. It's quite likely that LM Arena (and LLM providers) can continue to iterate and improve within this paradigm, but in addition I also have a new candidate in mind to potentially join the ranks of "top tier eval". It is the @openrouter LLM rankings: openrouter.ai/rankings Basically, OpenRouter allows people/companies to quickly switch APIs between LLM providers. All of them have real use cases (not toy problems or puzzles), they have their own private evals, and all of them have an incentive to get their choices right, so by choosing one LLM over another they are directly voting for some combo of capability+cost. I don't think OpenRouter is there just yet in both the quantity and diversity of use, but something of this kind I think has great potential to grow into a very nice, very difficult to game eval.

English
0
3
8
1.5K
Shantanu Sharma retweetledi
David Sacks
David Sacks@davidsacks47·
Congrats to the @AIatMeta team on the launch of their new Llama 4 open-weights models. For the U.S. to win the AI race, we have to win in open source too, and Llama 4 puts us back in the lead.
AI at Meta@AIatMeta

Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model with 16 experts. • Industry-leading context window of 10M tokens. • Outperforms Gemma 3, Gemini 2.0 Flash-Lite and Mistral 3.1 across a broad range of widely accepted benchmarks. Llama 4 Maverick • 17B-active-parameter model with 128 experts. • Best-in-class image grounding with the ability to align user prompts with relevant visual concepts and anchor model responses to regions in the image. • Outperforms GPT-4o and Gemini 2.0 Flash across a broad range of widely accepted benchmarks. • Achieves comparable results to DeepSeek v3 on reasoning and coding — at half the active parameters. • Unparalleled performance-to-cost ratio with a chat version scoring ELO of 1417 on LMArena. These models are our best yet thanks to distillation from Llama 4 Behemoth, our most powerful model yet. Llama 4 Behemoth is still in training and is currently seeing results that outperform GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM-focused benchmarks. We’re excited to share more details about it even while it’s still in flight. Read more about the first Llama 4 models, including training and benchmarks ➡️ go.fb.me/gmjohs Download Llama 4 ➡️ go.fb.me/bwwhe9

English
81
129
1.4K
273.7K
Shantanu Sharma retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
New 2h11m YouTube video: How I Use LLMs This video continues my general audience series. The last one focused on how LLMs are trained, so I wanted to follow up with a more practical guide of the entire LLM ecosystem, including lots of examples of use in my own life. Chapters give a sense of content: 00:00:00 Intro into the growing LLM ecosystem 00:02:54 ChatGPT interaction under the hood 00:13:12 Basic LLM interactions examples 00:18:03 Be aware of the model you're using, pricing tiers 00:22:54 Thinking models and when to use them 00:31:00 Tool use: internet search 00:42:04 Tool use: deep research 00:50:57 File uploads, adding documents to context 00:59:00 Tool use: python interpreter, messiness of the ecosystem 01:04:35 ChatGPT Advanced Data Analysis, figures, plots 01:09:00 Claude Artifacts, apps, diagrams 01:14:02 Cursor: Composer, writing code 01:22:28 Audio (Speech) Input/Output 01:27:37 Advanced Voice Mode aka true audio inside the model 01:37:09 NotebookLM, podcast generation 01:40:20 Image input, OCR 01:47:02 Image output, DALL-E, Ideogram, etc. 01:49:14 Video input, point and talk on app 01:52:23 Video output, Sora, Veo 2, etc etc. 01:53:29 ChatGPT memory, custom instructions 01:58:38 Custom GPTs 02:06:30 Summary Link in the reply post 👇
Andrej Karpathy tweet media
English
400
1.6K
13.9K
966.4K
Shantanu Sharma
Shantanu Sharma@shantanu·
Experimented with DeepSeek-R1 this weekend. DeepSeek-R1 migrated from supervised fine tuning to reinforcement learning for model training, enabling generating longer chain-of-thought (CoT). Using Group Relative Policy Optimization (GRPO) to save the training costs enables a more efficient RL training approach. Exciting to see model release under permissive MIT license. Looking forward to how far ahead GRPO RL-based training can take reasoning, and how costs of building and deploying models on enterprise datasets can be further reduced. alphaxiv.org/abs/2501.12948
English
0
2
2
128
Shantanu Sharma
Shantanu Sharma@shantanu·
"We have a great generic opioid overdose antidote, naloxone. It needs to be cheaper and available everywhere, not hidden behind pharmacy counters but placed near every defibrillator and in every first aid kit. Two medications — methadone and buprenorphine — have proved to cut the risk of death among people with opioid addictions by 50 percent or more when used long-term. They also need to be made accessible to all people with opioid addiction. Right now, most rehabs still fail to adequately provide them." nytimes.com/interactive/20… #EndOverdose #OverdoseAwareness #OverdosePrevention #HarmReduction #StopOverdose #OpioidCrisis #RecoveryIsPossible #FentanylAwareness #SubstanceAbuseAwareness #SaveALife #StopTheStigma
English
0
0
1
225
Shantanu Sharma retweetledi
MLB
MLB@MLB·
SHOHEI OHTANI HAS DONE IT 50 HOME RUNS | 50 STOLEN BASES HISTORY
Eesti
1.4K
37.1K
125.7K
16.4M
Shantanu Sharma
Shantanu Sharma@shantanu·
Revolutionizing Finance with LLMs: An Overview of Applications and Insights. Large Language Models (LLMs) are reshaping the finance landscape, offering novel capabilities in processing textual data and zero-shot learning. From sentiment analysis to fraud detection, LLMs are proving to be instrumental in various computational finance tasks. The financial sector, however, presents challenges in applying LLMs due to its highly specialized and complex data. Financial terminology, evolving regulations, and market dynamics require a high level of model comprehension, ensuring accurate and reliable predictions—an essential aspect given the high-risk nature of financial decision-making. Currently playing a supportive role, LLMs have the potential to transform financial decision-making by enhancing existing models. The synergy between LLMs and quantitative models is poised to usher in a new era of innovation and efficiency in finance. Exciting possibilities await as LLMs evolve, promising a future where cutting-edge technology drives advancements in finance. Explore more insights on this transformative trend: arxiv.org/html/2401.1164… #llm #genai #machinelearning #enterpriseai #fintech #financialservices #bfsi #tech
Shantanu Sharma tweet mediaShantanu Sharma tweet mediaShantanu Sharma tweet mediaShantanu Sharma tweet media
English
0
0
1
150
Shantanu Sharma
Shantanu Sharma@shantanu·
Writer's Palmyra-Fin-70B-32K model has passed the CFA Level III exam, showcasing leading performance in their internal long-fin-eval benchmark. This model is designed to excel in analyzing and summarizing complex financial reports, market data, and economic indicators, providing structured summaries by extracting key information efficiently. The Palmyra-Fin-70B-32K model stands out as a leading LLM on financial benchmarks, surpassing other large language models in various financial tasks and evaluations. Try it out here: 🔗 build.nvidia.com/writer/palmyra… 🔗 writer.com/engineering/pa… 🔗 huggingface.co/Writer/Palmyra… 🔗 #evaluation-results" target="_blank" rel="nofollow noopener">huggingface.co/Writer/Palmyra… #FinancialServices #FinTech #LLM #NLP #MachineLearning
Shantanu Sharma tweet mediaShantanu Sharma tweet mediaShantanu Sharma tweet media
English
0
0
0
144
Shantanu Sharma
Shantanu Sharma@shantanu·
Survey paper on Large Language Models in Finance. Authors delve into current approaches employing LLMs in finance, from leveraging pretrained models via zero-shot or few-shot learning to training custom LLMs from scratch, addressing the industry's need for accuracy and fairness. Key challenges highlighted are the production of disinformation and the manifestation of biases (racial, gender, religious) in LLMs, emphasizing the criticality of trustworthy financial information and unbiased services. To enhance accuracy and combat biases, techniques like retrieval-augmented generation, content censoring, and output restriction can be implemented, ensuring control over generated content and minimizing bias. Despite offering more interpretability compared to traditional deep learning models, LLMs remain a black box, limiting explainability. This poses regulatory and governance challenges in the financial landscape. #llm #genai #ml #machinelearning #bfsi #lending Read more: arxiv.org/pdf/2311.10723…
English
0
0
1
80