Kush Varshney कुश वार्ष्णेय

8.4K posts

Kush Varshney कुश वार्ष्णेय

Kush Varshney कुश वार्ष्णेय

@krvarshney

I wrote a book. Free pdf: https://t.co/rFFL7mySnS Paperback: https://t.co/lF0IgC5T9z Tweets are my own and don't necessarily represent IBM.

Chappaqua, NY Katılım Mart 2012
647 Takip Edilen3.2K Takipçiler
Kush Varshney कुश वार्ष्णेय retweetledi
Paul
Paul@psschwei·
A blog on how to get frontier-level results from small language models by using Mellea and Granite Libraries do the heavy lifting: mellea.ai/blogs/small-mo…
English
0
1
1
476
rishi
rishi@RishiBommasani·
I like the analogy. Notably in the restaurant world, only one of these even is afforded the word open. Option 3 is an "open kitchen" restaurant. (I don't think all such restaurants would appreciate the customer shouting at the chef but let's put that aside) Though maybe there is some mismatch in the analogy: - Option 1 is just "you get the food". Analogue is "you get the model". This probably collapses open weight with everything less open than it since we don't distinguish weights vs API in food as far as I can imagine, and certainly there is no local vs. cloud distinction for food - Option 2 is "you get the food and recipe". I think this is a bit of a mismatch with open source since recipe is transparency (i.e. information about how to build) but not the actual ingredients themselves (whereas you might/do have the dataset in some stronger sense with open-source). But, worth noting in both cases that you are not given the cooking infrastructure or compute infrastructure to consume the ingredients and produce the food. One other subtlety is open kitchen restaurants are not fully open due to constraints: chefs do prepwork so that the cook time in front of the diner is reasonable length (e.g. omakase restaurant needs to prepare rice in advance). That's fine because the customer doesn't need 100% open and to see every gory detail, but not fine for researchers.
English
2
0
4
1.7K
Percy Liang
Percy Liang@percyliang·
I find myself repeatedly explaining the difference between open-weight (DeepSeek), open-source (Olmo), open-development (Marin). Let's see if this restaurant analogy helps: - Open-weight: food is made behind closed doors, server brings you the dish - Open-source: food is made behind closed doors, server brings you the dish and the recipe - Open-development: you see the chef make the dish in the kitchen (and can shout suggestions while its cooking)!
English
40
92
914
76K
Kush Varshney कुश वार्ष्णेय retweetledi
Javier Carnerero Cano
Javier Carnerero Cano@ccanojavi·
📊 We also introduce VELI5, a new dataset with controlled factual errors + ground-truth fixes. This dataset has already been used to fine-tune state-of-the art factuality guardrails such as Granite Guardian [huggingface.co/ibm-granite/gr…]. (3/4)
English
1
1
0
138
Kush Varshney कुश वार्ष्णेय retweetledi
aipulsedaily
aipulsedaily@aipulseda1ly·
IBM just dropped Granite 4.1, their largest model release to date Language, vision, speech, embeddings, and safety models all in one drop The 8B instruct model reportedly matches their previous 32B MoE on instruction following and tool calling Guardian 4.1 does risk and policy scoring with calibrated confidence levels instead of binary yes/no filtering, which is a smarter approach for enterprise deployment All Apache 2.0, available on HuggingFace, Ollama, and watsonx IBM is quietly building a full enterprise AI stack research.ibm.com/blog/granite-4…
aipulsedaily tweet mediaaipulsedaily tweet media
English
0
1
3
226
Kush Varshney कुश वार्ष्णेय retweetledi
AqibAi
AqibAi@Aqib__786Ai·
IBM is clearly doubling down on a very specific lane here: practical, efficient, enterprise-ready models rather than chasing leaderboard dominance. Granite 4.1 feels like a continuation of that philosophy—especially the 8B. That 4M token usage vs 78M on Qwen is kind of wild. In real deployments, that translates directly into: lower latency dramatically lower cost easier scaling for agent workflows Which honestly matters more than raw benchmark scores for most companies. The tradeoff is obvious though: you’re giving up peak intelligence. A 12 vs 15 score doesn’t sound huge, but in practice that gap can show up in: reasoning depth edge-case handling coding reliability So these aren’t “frontier competitors”—they’re workhorse models. What’s arguably more important is the Apache 2.0 + openness push. That 61 Openness Index score puts IBM ahead of most “open-ish” players like Alibaba (Qwen) and Google (Gemma). For enterprises, that’s a big deal: fewer licensing headaches more control over deployment (on-prem / air-gapped) easier compliance story The positioning is pretty clear: Granite 3B → edge / lightweight agents Granite 8B → sweet spot (cost vs capability) Granite 30B → heavier enterprise workloads where you still want efficiency The most interesting signal here isn’t the scores—it’s the token efficiency trend. If models like this keep improving, the industry might shift from “bigger is better” to: “good enough intelligence, but 10–20x cheaper to run” And that’s where adoption really explodes. Curious part: if someone pairs Granite 8B with strong retrieval + tools, it could close a lot of that intelligence gap without losing its cost advantage. That’s probably the real play.
English
0
1
1
251
Kush Varshney कुश वार्ष्णेय retweetledi
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
IBM has released three new non-reasoning Granite 4.1 models (30B, 8B, 3B) as open weights under Apache 2.0. All three are notably token-efficient relative to peer non-reasoning models, with the 8B standing out for its token efficiency relative to intelligence @IBM has released three new instruct models in the Granite 4.1 family: Granite 4.1 30B (15 on the Intelligence Index), Granite 4.1 8B (12), and Granite 4.1 3B (9). The release continues IBM's focus on small, efficient, and open models for enterprise and edge deployment, alongside the existing Granite 4.0 Nano family (1B and 350M variants released in October 2025). The Intelligence Index is the Artificial Analysis synthesis metric incorporating 10 evaluations covering agentic tasks, coding, and scientific reasoning. Key benchmarking results: ➤ All three Granite 4.1 models score 61 on the Artificial Analysis Openness Index, standing out among peer open weights non-reasoning models. This is driven by full open weights under Apache 2.0 plus partial disclosures across pre-training data, post-training data, and training methodology. Granite 4.1 sits well above peers like Qwen3.5 (39), Gemma 4 (39) and GLM-4.7-Flash (44), and represents a meaningful improvement over the Granite 4.0 family (56), driven by stronger methodology disclosure. Olmo 3.1 and K2 Think V2 (both 89) remain leaders as the most ‘open’ models. ➤ Granite 4.1 8B uses just 4M output tokens to run the Intelligence Index. This is ~20x fewer than Qwen3.5 9B (78M tokens), ~3x fewer than Ministral 3 8B (13M), and ~2x fewer than Gemma 4 E4B (8M). The pattern holds across the family: Granite 4.1 30B uses 4.6M output tokens (vs 7M for Gemma 4 31B and 25M for Qwen3.5 27B), and Granite 4.1 3B uses 2.7M. ➤ Token efficiency comes at the cost of intelligence relative to peer non-reasoning models. Granite 4.1 30B (15) trails leading peers like Qwen3.5 27B (37) and Gemma 4 31B (32). Granite 4.1 8B (12) trails Ministral 3 8B (15) and Gemma 4 E4B (15). Granite 4.1 3B (9) trails Gemma 4 E2B (12). ➤ Granite 4.1 30B and 3B both gain on the Intelligence Index over their Granite 4.0 predecessors. Granite 4.1 30B (15) gains 4 points over Granite 4.0 H Small (32B / 9B active, 11), with the largest gains in tool use (τ²-Bench: 42% vs 17%) and agentic tasks (GDPval-AA: 493 vs 344 Elo). Granite 4.1 3B (9) gains 1 point over Granite 4.0 Micro (8). Other information: ➤ License: Apache 2.0 (open weights, permissive commercial use) ➤ Context window: 128K tokens ➤ Availability: Granite 4.1 8B is available via @WandB ($0.05/$0.1 per 1M input/output tokens) and @replicate. Weights for all three models are available via @huggingface.
Artificial Analysis tweet media
English
10
30
240
23.9K
Kush Varshney कुश वार्ष्णेय retweetledi
Keshav Ramji
Keshav Ramji@KeshavRamji·
What if your language model could reason efficiently in an entirely new language? We introduce Abstract Chain-of-Thought, a new mechanism which allows language models to reason through a short sequence of reserved "abstract" tokens through reinforcement learning. It is as performant as verbalized CoT at a fraction of the cost, achieving major gains in inference-time efficiency.
Keshav Ramji tweet media
English
60
133
1.1K
1.2M
Kush Varshney कुश वार्ष्णेय retweetledi
Alex Bozarth
Alex Bozarth@stbando·
I've been working on an open source project called Mellea, and wrote a blog post about using it to automatically validate and fix Qiskit code generated by an LLM: mellea.ai/blogs/qiskit-i…
English
1
2
4
200
Percy Liang
Percy Liang@percyliang·
Academic titles are funny. After 14 years, I finally have the official title that people might have always assumed I had.
English
93
22
1.3K
116K
Timur Yessenov
Timur Yessenov@Timur_Yessenov·
@krvarshney disagree with the disagree tbh. humans absolutely hold contradictory moral beliefs - we just don't notice until someone points it out. the interesting question isn't whether llms should be consistent, it's whether moral inconsistency is actually a feature
English
1
0
0
17
Kush Varshney कुश वार्ष्णेय retweetledi
Miriam Rateike
Miriam Rateike@miriamrateike·
We have extended our ICLR workshop deadline to Feb 5th! #AFAA2026 Submit your work on fairness across alignment & agentic AI systems. We also continue to accept broad work on fairness. CfP: afciworkshop.org/call-for-papers
AFAA 2026 @ ICLR@afciworkshop

🚨 Deadline Extended to Feb 5 (AoE)! CFP still OPEN for the #AFAA2026 Workshop at @iclr_conf — on fairness across alignment & agentic AI systems. Full & tiny papers welcome • Interdisciplinary work encouraged! 🔗 afciworkshop.org #ICLR2026 #AFAA2026

English
0
2
1
450
Kush Varshney कुश वार्ष्णेय retweetledi
Chappaqua Central School District
Chappaqua Central School District@chappaqua_csd·
4th graders welcomed RB parent Kush R. Varshney, an IBM Fellow who volunteered his time to explain how AI works—its benefits and pitfalls—with a tailored presentation featuring our school song and a Charlotte’s Web excerpt. Grateful for his generosity & expertise! #WeAreChappaqua
Chappaqua Central School District tweet media
English
0
1
3
214
Kush Varshney कुश वार्ष्णेय retweetledi
Satyapriya Krishna
Satyapriya Krishna@SatyaScribbles·
Grateful to have co-hosted the Trusted AI Symposium yesterday. Left with so many new ideas from the posters, panels, and lectures. 🧠 Big thanks to our keynote speakers, panelists, and staff for driving the conversation on trust in AI.🤝 #TrustedAISymposium2026
Satyapriya Krishna tweet mediaSatyapriya Krishna tweet mediaSatyapriya Krishna tweet mediaSatyapriya Krishna tweet media
English
1
3
15
1.4K
Kush Varshney कुश वार्ष्णेय retweetledi
rishi
rishi@RishiBommasani·
How transparent are major AI companies? We answer this question each year in the annual Foundation Model Transparency Index. While the AI industry as a whole is quite opaque, we found a huge spread. @IBM scored a 95/100 while @xai scored 14/100. So what's going on? 🧵
rishi tweet media
English
15
20
66
59.9K
Kush Varshney कुश वार्ष्णेय retweetledi
AAAI
AAAI@RealAAAI·
As a follow up from the 2025 AAAI report on the future of AI research, we are organizing several deep dives into some of the topics covered by the report. The next webinar will be on December 11th at 1pm Eastern time and includes AI experts who will discuss AI Factuality and Trustworthiness. Register here: aaaiforms.wufoo.com/forms/q47sc7n1… Topics include: Understanding Factuality, Beyond Accuracy, and Practical Solutions. Our Expert Panel: Oren Etzioni, Founder TrueMedia.org, Professor Emeritus University of Washington Henry Kautz, Professor of Computer Science, University of Virginia at Charlottesville Kush Varshney, IBM Fellow and Co-Director IBM Science for Social Good Moderated by: Francesca Rossi, AAAI past president, IBM Fellow and AI Ethics Global Leader. Whether you're an AI professional, researcher, or simply curious about the future of AI, this discussion will offer valuable insights into one of technology's most consequential frontiers.
English
0
3
14
2.1K