Gagan Bansal (@bansalg_) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

📢 Networks of agents are the future and in order to make them useful at scale, they must remain secure. So, we deployed an internal platform where every agent was always-on, had a known human "principal" (MS employee) that it reported to, and could interact w/ other agents via shared forums, DMs, and social apps like wallet, marketplace, and calendar. This created a long-running network of agents! Then we collaborated with Microsoft's amazing red-teaming team to "crack it" and help understand the its vulnerabilities. This blog captures some of our understanding of what happened and how to we are thinking about the future.

Microsoft Research@MSFTResearch

Safe agents don’t guarantee a safe ecosystem of interconnected agents. Microsoft Research examines what breaks when AI agents interact and why network-level risks require new approaches. Learn more: microsoft.com/en-us/research…

English

1

5

22

1.7K

Gagan Bansal retweetledi

Valerie Chen@valeriechen_·16 Haz

📢Call for abstract submissions: 2026 Workshop on Human-AI Complementarity for Decision Making at CMU This year's theme: Dynamic Human-AI Alignment. How do we design AI systems that don't just emulate static human preferences, but actively and effectively coordinate with humans over time? 📅Workshop Dates: September 24-25, 2026 🗺️Workshop Location: Pittsburgh, PA, USA ⌛️ Application Deadline: Friday, July 17, 2026 Funding for travel/lodging available to support speakers + student presenters.

English

1

10

42

5.3K

Gagan Bansal@bansalg_·8 Tem

Very cool stuff!

Thomas Dohmke@ashtom

42 is the answer to everything. This is @EntireHQ’s answer to Git in the era of agents: fast, independent, distributed. Mirror your GitHub repos, let your agents clone and pull from the region(s) of your choice, and…we're open sourcing it. 🤖

English

0

2

1.1K

Gagan Bansal@bansalg_·6 Tem

If you are ICML, checkout cool work by @iamwaynechi

Wayne Chi ✈️ ICML@iamwaynechi

I'll be presenting GameDevBench at ICML tomorrow! If you're interested in agents and games, come check us out poster session 2 from 2 PM to 3:45 PM!

English

2

16

2.3K

Gagan Bansal retweetledi

Markus J. Buehler@ProfBuehlerMIT·25 Haz

x.com/i/article/2070…

ZXX

10

22

188

40.4K

Gagan Bansal retweetledi

Jessica Hullman@JessicaHullman·24 Haz

Using AI to support peer review seems unavoidable, but what quality checks should AI implement? We can take some lessons from metascience on the hard reward design problem that is AI review. I wrote a paper synthesizing a few points the emerging lit is at risk of confusing. 1/

English

3

11

46

11.5K

Valerie Chen@valeriechen_·24 Haz

As my time in Pittsburgh comes to an end... I'm excited to share that I will be joining UIUC as an Assistant Professor in fall 2027! I will recruit PhD students in both ECE and CS. I’m looking for students and postdocs who are excited to continue building towards collaborative AI systems that learn and adapt through interaction.

English

99

44

902

55.7K

Gagan Bansal@bansalg_·24 Haz

@valeriechen_ Congratulations Valerie!

English

1

0

1

290

Gagan Bansal retweetledi

Eric Horvitz@erichorvitz·18 Haz

Test of time awards are the most impressive honors for publications. Longuet-Higgins is the 10yr impact award for CVPR papers. Congrats to the ResNet team. Seems like yesterday. @MSFTResearch

Microsoft Research@MSFTResearch

ResNet has received the Longuet-Higgins Prize at CVPR 2026, recognizing research with proven, lasting impact. A decade after its publication, residual connections remain foundational to how modern AI systems are built, with over 320,000 citations and growing. msft.it/6017vl6bl

English

0

2

11

2K

Gagan Bansal retweetledi

Richard Socher@RichardSocher·10 Haz

In a week when some of the leaders in AI are trying to pull up the ladder behind them and prevent the automation of science and self-improving superintelligence, we're committed to building RSI safely and publicizing the outputs of our system to give humanity an audit trail of its inventions and intentions and let the open source community build on top of them. Stay tuned for the first such result in the coming days.

English

22

27

388

30.3K

Gagan Bansal retweetledi

Shriram Krishnamurthi (primary: Bluesky)@ShriramKMurthi·9 Haz

I've spent months rethinking and rebuilding my programming languages course from scratch for the agentic coding era. I wrote myself a memo explaining what I'm doing. I figure others might be interested in the redesign, so here goes! Feedback welcome ofc. docs.google.com/document/d/e/2…

English

20

39

334

27.7K

Gagan Bansal retweetledi

clem 🤗@ClementDelangue·10 Haz

Concentration of power, capabilities and economic wealth is the biggest risk in AI. We need open science and open-source more than ever!

English

115

479

3.1K

169.8K

Gagan Bansal retweetledi

Julien Chaumond@julien_c·9 Haz

Very dystopian ngl

elie@eliebakouch

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community also the fact that this is un purpose not visible to the user is crazy

CY

22

36

690

37.2K

Gagan Bansal retweetledi

elie@eliebakouch·9 Haz

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community also the fact that this is un purpose not visible to the user is crazy

Claude@claudeai

Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.

English

351

645

5.6K

4M

Gagan Bansal retweetledi

Graham Neubig@gneubig·9 Haz

First they came for the model builders... I feel we're getting a glimpse of a future where AI is only provided to a privileged few, and that's not a future I want to live in.

elie@eliebakouch

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community also the fact that this is un purpose not visible to the user is crazy

English

21

104

844

73K

Gagan Bansal@bansalg_·5 Haz

@vykthur Congrats @vykthur

English

0

1

579

Victor Dibia@vykthur·5 Haz

It's Microsoft Build week, and I'm excited to share something my team and I have been building: Agent Optimization, now in private preview on Microsoft Foundry.

English

5

7

75

8.2K

Gagan Bansal@bansalg_·3 Haz

@kylelostat congrats @kylelostat

English

1

0

1

311

Kyle Lo@kylelostat·2 Haz

happy to share another quality tech report w/ the wider research community 🫶 great read for ppl who want to see all the details for methods + infra for scaling up pretraining & RL, esp detailed discussion about data which is often kept vague by other labs

Mustafa Suleyman@mustafasuleyman

Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier. First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks. - It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities. - It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks. - And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end. Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing. Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI. - Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost. All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat. Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost. Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare. Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: microsoft.ai/news/building-…

English

13

23

388

26.9K

Gagan Bansal retweetledi

Hanna Hajishirzi@HannaHajishirzi·2 Haz

MAI-Thinking-1 is out! Excited to share what we are building and how climbing from scratch (no distillation) actually works: simple recipes, rigorous science, self-distillation, patience, and great infra. Check out our tech report has the full story of our RL climbs. microsoft.ai/wp-content/upl…

Mustafa Suleyman@mustafasuleyman

Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier. First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks. - It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities. - It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks. - And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end. Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing. Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI. - Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost. All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat. Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost. Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare. Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: microsoft.ai/news/building-…

English

24

128

874

133.1K

Gagan Bansal@bansalg_·22 May

👇👇👇

Karthik Narasimhan@karthik_r_n

Deep work and deep thinking will be increasingly valuable in a world where AI agents automate a lot of knowledge work. Spawning tens of agents in parallel and context switching between them feels productive, but it's mostly dopamine.

ART

0

334

Gagan Bansal retweetledi

Hussein Mozannar@HsseinMzannar·21 May

We're releasing a very capable browser use model Fara1.5-9B that feels like a step-change in terms of small CUA models capability achieving 63% on OnlineM2W auto-eval. We've put in a lot of work to make it useful for all types of web tasks. microsoft.com/en-us/research…

English

4

10

38

3.7K

Gagan Bansal retweetledi

Diyi Yang@Diyi_Yang·20 May

The next frontier of AI is not only more capable model; it is an AI that *humans* can meaningfully live and work with :) With all students in my cs329x Human-Centered LLM class, we present 60+ pages of insights for developing Human-Centered LLMs (HCLLMs), from design & data sourcing to training, eval & deployment 🧵

English

14

77

290

55.1K

Gagan Bansal

Keşfet