Merlin Stein

23 posts

Merlin Stein

@merlinstein_

Frontier AI Evaluations & Monitoring | UK AISI | PhD candidate @Oxford | ex-EU AIO

Oxford/London, UK Katılım Aralık 2023

39 Takip Edilen88 Takipçiler

Merlin Stein@merlinstein_·27 Mar

@prpaskov thank you!

English

Patricia Paskov@prpaskov·26 Mar

@merlinstein_ has outdone himself on this one, can't wait to dig in

AI Security Institute@AISecurityInst

🔍How are AI agents used in the real world? We analysed 177,000+ agent tools published between November 2024 and February 2026 and found rapid growth in deployment for increasingly complex tasks. Learn more ⬇️

English

100

Merlin Stein@merlinstein_·27 Mar

@thammadou @AISecurityInst thank you!

English

Tarik Hammadou@thammadou·27 Mar

Great work by @merlinstein_ and @AISecurityInst — first empirical ground truth for the agentic AI ecosystem. The MCP telemetry approach to monitoring agent deployment is exactly what the field needs.

English

Tarik Hammadou@thammadou·27 Mar

Fascinating read — "How Are AI Agents Used? Evidence from 177,000 MCP Tools" by Merlin Stein, Oxford/UK AI Security Institute. First large-scale empirical analysis of the agentic AI ecosystem using real deployment telemetry from 177,436 MCP server tools tracked Nov 2024 → Feb 2026. The taxonomy is clean: • Perception tools — read/access data • Reasoning tools — analyze data • Action tools — modify external environments (file edits, emails, API calls, device control) Key data points: • Software development dominates: 67% of all agent tools, 90% of MCP server downloads. Coding agents are the killer app of the agentic era. • Action tools grew from 27% → 65% of total usage in 16 months. Agents aren't reading anymore — they're writing to file systems, triggering financial transactions, and controlling physical devices. • The paper uses O*NET task mapping to score consequentiality. Most action tools today are medium-stakes (file editing, code commits), but higher-stakes tools for financial transactions and system administration are growing fast. • The policy contribution: governments should monitor the tool layer (MCP servers), not just model outputs. The risk surface has shifted from what the model says to what the agent does. Great work grounding the agentic AI conversation in actual deployment data instead of benchmarks. This is the kind of empirical foundation the field needs as we move from prototype to production. arxiv.org/abs/2603.23802 #AgenticAI #MCP #LLM #AIAgents #AISafety #NVIDIADev #GTC2026 #MultiAgentSystems #DevTools #AIGovernance

English

Merlin Stein@merlinstein_·27 Mar

Compared to Jan 2025, now there are >30x more agent tools published >150x more installations of agent tools >60% (vs. 30%) of tool uses are 'actions' vs. 'perception' The action space of agents is rapidly increasing. More in my new paper: arxiv.org/abs/2603.23802…

AI Security Institute@AISecurityInst

English

149

Merlin Stein@merlinstein_·19 Ara

@tomekkorbak Thank you ;) More detailed figures on the agent ecosystem soon in upcoming publications

English

Tomek Korbak@tomekkorbak·18 Ara

This report is a treasure trove of great figures!

AI Security Institute@AISecurityInst

📈 Today, we’re releasing our first Frontier AI Trends Report: evaluation results on 30+ frontier models from the past two years, showing rapid progress in chemistry and biology, cyber capabilities, autonomy, and more. ▶️Read now: aisi.gov.uk/frontier-ai-tr…

English

1.5K

Merlin Stein retweetledi

AI Security Institute@AISecurityInst·18 Ara

English

159

64.2K

Merlin Stein retweetledi

Markus Anderljung@Manderljung·10 Tem

The EU's Code of Practice for General-Purpose AI is out. As one of the co-chairs who drafted the Safety & Security Chapter, focused on frontier AI, I'm proud of what we've put together. It’s a lean but effective framework for frontier AI companies to comply with the AI Act.

English

5.8K

Merlin Stein@merlinstein_·24 Şub

New: Code Inspections to assess agent autonomy. Idea: Scan the code to pre-filter by autonomy levels which agents to assess more in-depth (e.g. via runtime evaluations) & to monitor open source agent developments arxiv.org/abs/2502.15212 w/ @pcihon @bansalg_ @sj_manning & Kevin Xu

English

441

Merlin Stein@merlinstein_·13 Ara

Want to make AI safe & helpful? ✔️ 3 years of work or research experience related to AI or digital topics, that are relevant for EU policy? ✔️ EU citizen?✔️ Apply until Jan. 15 to join one of the most exciting teams in AI governance: The EU AI Office. eu-careers.europa.eu/en/job-opportu…

English

144

Merlin Stein@merlinstein_·25 Kas

Apply by Dec 8 here: ec.europa.eu/eusurvey/runne…

English

Merlin Stein@merlinstein_·25 Kas

Eligibility: Every org who has evaluated GPAI models that are accessible in the EU (like Llama or GPT4o ...). Selection: Quality and EU relevance of your best paper (summary) about your eval of a particular risk. Apply by Dec. 8: EUSurvey - Survey

English

Merlin Stein@merlinstein_·25 Kas

Are you an AI eval org or AI eval research lab? New opportunity to work together with the European AI Office! Apply for the workshop & be invited for further technical exchange with the AI Office. digital-strategy.ec.europa.eu/en/news/call-e…

English

127

Merlin Stein@merlinstein_·14 Kas

13 Independent Chairs fininished the first Draft of ‘EU rules’ on advanced AI, as part of the code of practice process: digital-strategy.ec.europa.eu/en/library/fir…

English

109

Merlin Stein retweetledi

Connor Dunlop@cp_dunlop·9 Eki

Post-deployment information sharing would allow the AI ecosystem to jointly enhance understanding of AI impacts 'in the wild', and contribute to evidence-based policy and risk management. With @merlinstein_ & @The_JBernardi we make the case to increase these info flows by:

English

2.3K

Merlin Stein@merlinstein_·30 Tem

Participate in specifying EU rules for advanced AI on evals, monitoring, model card transparency, responsible scaling policies, …. The drawing up of the “codes of practice” is now open to input (lnkd.in/d76iy6iD) & participation by stakeholders: (lnkd.in/dfnTMeWJ)

English

198

Merlin Stein@merlinstein_·17 Tem

Excited to have joined the #EUAIOffice in #Brussels on an expert secondment. Focus: general-purpose AI evaluations, monitoring & code of practices. #AIAct implementation. So grateful to apply some of my PhD research in practice & understand EU AI priorities.

English

204

Merlin Stein@merlinstein_·16 Tem

Grateful to contribute to this research memo: AISIs can shape AI governance by providing the information basis. Audits, evaluations and monitoring reduce the unknowns. Done by AISIs & the ecosystem.

Oxford Martin AI Governance Initiative@aigioxford

New research memo! The AI Safety Institutes (AISIs) are poised to assume an increasingly significant role in the governance of advanced AI. We recently held an expert workshop to explore the roles AISIs can play. Find out more here @oxmartinschool oxfordmartin.ox.ac.uk/publications/a…

English

119

Merlin Stein@merlinstein_·4 Tem

@AdaLovelaceInst @cp_dunlop Thank you for the great collab!

English

Ada Lovelace Institute@AdaLovelaceInst·3 Tem

📢New blog post! @merlinstein_ and @cp_dunlop examine why post-deployment monitoring of AI models is needed, who and what can be monitored, and how monitoring could be improved for the benefit of people and society. adalovelaceinstitute.org/blog/post-depl…

English

5.4K

Merlin Stein@merlinstein_·4 Tem

In that context - enjoyed reading similar work on the similarity between algorithmic trading & AI agents from @zittrain. How will circuit breakers for AI agents look like? theatlantic.com/technology/arc…

English

126

Merlin Stein@merlinstein_·4 Tem

Systemic risks of general-purpose AI and AI agents might materialize in finance. Like algo trading flash crashes, integration of AI agents might lead to correlated risks - Maybe. Regulators need visibility. Scenarios in my new piece with the BIS & podcast: #page=38.11" target="_blank" rel="nofollow noopener">bis.org/publ/work1194.…

The Investment Association@InvAssoc

We’re excited to share the latest episode of our podcast, IA talks AI, discussing financial stability and systemic risks from the use of AI in asset management. Host @john_allan_ia sits down with @merlinstein_, AI governance researcher at the University of Oxford and co-author of the recent Bank for International Settlements – @BIS_org Working Paper on AI to outline the paper’s findings on the potential of an ‘intelligent financial system’ transformed by AI, and what it means for future regulatory policy – and asset manager’s roles. Click here to listen to the full episode, available now on Spotify and Apple Podcasts: theia.org/news/ia-talks-… Click here to read the BIS Working Paper on AI: bis.org/publ/work1194.…

English

597

Merlin Stein@merlinstein_·28 Haz

New: "Safe beyond sale: Post-deployment monitoring for advanced AI governance." with @cp_dunlop - Why? 1) Pre-deployment evals might fail 2) AI capabilities and risks change depending on usage context - How? An FMTI, but extended & mandatory adalovelaceinstitute.org/blog/post-depl…

English

9.3K

Keşfet

@prpaskov @thammadou @AISecurityInst @tomekkorbak @pcihon @bansalg_ @sj_manning @The_JBernardi