Merlin Stein

23 posts

Merlin Stein banner
Merlin Stein

Merlin Stein

@merlinstein_

Frontier AI Evaluations & Monitoring | UK AISI | PhD candidate @Oxford | ex-EU AIO

Oxford/London, UK Katılım Aralık 2023
39 Takip Edilen88 Takipçiler
Tarik Hammadou
Tarik Hammadou@thammadou·
Great work by @merlinstein_ and @AISecurityInst — first empirical ground truth for the agentic AI ecosystem. The MCP telemetry approach to monitoring agent deployment is exactly what the field needs.
English
1
0
0
24
Tarik Hammadou
Tarik Hammadou@thammadou·
Fascinating read — "How Are AI Agents Used? Evidence from 177,000 MCP Tools" by Merlin Stein, Oxford/UK AI Security Institute. First large-scale empirical analysis of the agentic AI ecosystem using real deployment telemetry from 177,436 MCP server tools tracked Nov 2024 → Feb 2026. The taxonomy is clean: • Perception tools — read/access data • Reasoning tools — analyze data • Action tools — modify external environments (file edits, emails, API calls, device control) Key data points: • Software development dominates: 67% of all agent tools, 90% of MCP server downloads. Coding agents are the killer app of the agentic era. • Action tools grew from 27% → 65% of total usage in 16 months. Agents aren't reading anymore — they're writing to file systems, triggering financial transactions, and controlling physical devices. • The paper uses O*NET task mapping to score consequentiality. Most action tools today are medium-stakes (file editing, code commits), but higher-stakes tools for financial transactions and system administration are growing fast. • The policy contribution: governments should monitor the tool layer (MCP servers), not just model outputs. The risk surface has shifted from what the model says to what the agent does. Great work grounding the agentic AI conversation in actual deployment data instead of benchmarks. This is the kind of empirical foundation the field needs as we move from prototype to production. arxiv.org/abs/2603.23802 #AgenticAI #MCP #LLM #AIAgents #AISafety #NVIDIADev #GTC2026 #MultiAgentSystems #DevTools #AIGovernance
English
5
0
0
50
Merlin Stein
Merlin Stein@merlinstein_·
Compared to Jan 2025, now there are >30x more agent tools published >150x more installations of agent tools >60% (vs. 30%) of tool uses are 'actions' vs. 'perception' The action space of agents is rapidly increasing. More in my new paper: arxiv.org/abs/2603.23802…
Merlin Stein tweet media
AI Security Institute@AISecurityInst

🔍How are AI agents used in the real world? We analysed 177,000+ agent tools published between November 2024 and February 2026 and found rapid growth in deployment for increasingly complex tasks. Learn more ⬇️

English
0
1
5
149
Merlin Stein
Merlin Stein@merlinstein_·
@tomekkorbak Thank you ;) More detailed figures on the agent ecosystem soon in upcoming publications
English
0
0
1
5
Merlin Stein retweetledi
AI Security Institute
AI Security Institute@AISecurityInst·
📈 Today, we’re releasing our first Frontier AI Trends Report: evaluation results on 30+ frontier models from the past two years, showing rapid progress in chemistry and biology, cyber capabilities, autonomy, and more. ▶️Read now: aisi.gov.uk/frontier-ai-tr…
AI Security Institute tweet mediaAI Security Institute tweet media
English
11
39
159
64.2K
Merlin Stein retweetledi
Markus Anderljung
Markus Anderljung@Manderljung·
The EU's Code of Practice for General-Purpose AI is out. As one of the co-chairs who drafted the Safety & Security Chapter, focused on frontier AI, I'm proud of what we've put together. It’s a lean but effective framework for frontier AI companies to comply with the AI Act.
Markus Anderljung tweet media
English
3
21
76
5.8K
Merlin Stein
Merlin Stein@merlinstein_·
New: Code Inspections to assess agent autonomy. Idea: Scan the code to pre-filter by autonomy levels which agents to assess more in-depth (e.g. via runtime evaluations) & to monitor open source agent developments arxiv.org/abs/2502.15212 w/ @pcihon @bansalg_ @sj_manning & Kevin Xu
Merlin Stein tweet mediaMerlin Stein tweet media
English
0
2
5
441
Merlin Stein
Merlin Stein@merlinstein_·
Want to make AI safe & helpful? ✔️ 3 years of work or research experience related to AI or digital topics, that are relevant for EU policy? ✔️ EU citizen?✔️ Apply until Jan. 15 to join one of the most exciting teams in AI governance: The EU AI Office. eu-careers.europa.eu/en/job-opportu…
English
0
0
4
144
Merlin Stein
Merlin Stein@merlinstein_·
Eligibility: Every org who has evaluated GPAI models that are accessible in the EU (like Llama or GPT4o ...). Selection: Quality and EU relevance of your best paper (summary) about your eval of a particular risk. Apply by Dec. 8: EUSurvey - Survey
English
1
0
0
75
Merlin Stein retweetledi
Connor Dunlop
Connor Dunlop@cp_dunlop·
Post-deployment information sharing would allow the AI ecosystem to jointly enhance understanding of AI impacts 'in the wild', and contribute to evidence-based policy and risk management. With @merlinstein_ & @The_JBernardi we make the case to increase these info flows by:
English
1
7
23
2.3K
Merlin Stein
Merlin Stein@merlinstein_·
Participate in specifying EU rules for advanced AI on evals, monitoring, model card transparency, responsible scaling policies, …. The drawing up of the “codes of practice” is now open to input (lnkd.in/d76iy6iD) & participation by stakeholders: (lnkd.in/dfnTMeWJ)
English
0
0
8
198
Merlin Stein
Merlin Stein@merlinstein_·
Excited to have joined the #EUAIOffice in #Brussels on an expert secondment. Focus: general-purpose AI evaluations, monitoring & code of practices. #AIAct implementation. So grateful to apply some of my PhD research in practice & understand EU AI priorities.
Merlin Stein tweet media
English
0
0
12
204
Merlin Stein
Merlin Stein@merlinstein_·
Grateful to contribute to this research memo: AISIs can shape AI governance by providing the information basis. Audits, evaluations and monitoring reduce the unknowns. Done by AISIs & the ecosystem.
Oxford Martin AI Governance Initiative@aigioxford

New research memo! The AI Safety Institutes (AISIs) are poised to assume an increasingly significant role in the governance of advanced AI. We recently held an expert workshop to explore the roles AISIs can play. Find out more here @oxmartinschool oxfordmartin.ox.ac.uk/publications/a…

English
0
0
1
119
Merlin Stein
Merlin Stein@merlinstein_·
Systemic risks of general-purpose AI and AI agents might materialize in finance. Like algo trading flash crashes, integration of AI agents might lead to correlated risks - Maybe. Regulators need visibility. Scenarios in my new piece with the BIS & podcast: #page=38.11" target="_blank" rel="nofollow noopener">bis.org/publ/work1194.…
Merlin Stein tweet media
The Investment Association@InvAssoc

We’re excited to share the latest episode of our podcast, IA talks AI, discussing financial stability and systemic risks from the use of AI in asset management. Host @john_allan_ia sits down with @merlinstein_, AI governance researcher at the University of Oxford and co-author of the recent Bank for International Settlements – @BIS_org Working Paper on AI to outline the paper’s findings on the potential of an ‘intelligent financial system’ transformed by AI, and what it means for future regulatory policy – and asset manager’s roles. Click here to listen to the full episode, available now on Spotify and Apple Podcasts: theia.org/news/ia-talks-… Click here to read the BIS Working Paper on AI: bis.org/publ/work1194.…

English
1
1
3
597
Merlin Stein
Merlin Stein@merlinstein_·
New: "Safe beyond sale: Post-deployment monitoring for advanced AI governance." with @cp_dunlop - Why? 1) Pre-deployment evals might fail 2) AI capabilities and risks change depending on usage context - How? An FMTI, but extended & mandatory adalovelaceinstitute.org/blog/post-depl…
Merlin Stein tweet media
English
1
9
26
9.3K