Adrian Brasoveanu

5K posts

Adrian Brasoveanu

@AdrianB82

Researcher @ MODUL University Vienna | NLP ■ Knowledge Graphs ■ Machine Learning ■ Information Visualization

Vienna Katılım Mart 2011

2.7K Takip Edilen591 Takipçiler

Adrian Brasoveanu retweetledi

Jenny Zhang@jennyzhangzt·2d

Introducing Hyperagents: an AI system that not only improves at solving tasks, but also improves how it improves itself. The Darwin Gödel Machine (DGM) demonstrated that open-ended self-improvement is possible by iteratively generating and evaluating improved agents, yet it relies on a key assumption: that improvements in task performance (e.g., coding ability) translate into improvements in the self-improvement process itself. This alignment holds in coding, where both evaluation and modification are expressed in the same domain, but breaks down more generally. As a result, prior systems remain constrained by fixed, handcrafted meta-level procedures that do not themselves evolve. We introduce Hyperagents – self-referential agents that can modify both their task-solving behavior and the process that generates future improvements. This enables what we call metacognitive self-modification: learning not just to perform better, but to improve at improving. We instantiate this framework as DGM-Hyperagents (DGM-H), an extension of the DGM in which both task-solving behavior and the self-improvement procedure are editable and subject to evolution. Across diverse domains (coding, paper review, robotics reward design, and Olympiad-level math solution grading), hyperagents enable continuous performance improvements over time and outperform baselines without self-improvement or open-ended exploration, as well as prior self-improving systems (including DGM). DGM-H also improves the process by which new agents are generated (e.g. persistent memory, performance tracking), and these meta-level improvements transfer across domains and accumulate across runs. This work was done during my internship at Meta (@AIatMeta), in collaboration with Bingchen Zhao (@BingchenZhao), Wannan Yang (@winnieyangwn), Jakob Foerster (@j_foerst), Jeff Clune (@jeffclune), Minqi Jiang (@MinqiJiang), Sam Devlin (@smdvln), and Tatiana Shavrina (@rybolos).

English

139

582

3.2K

310.4K

Adrian Brasoveanu retweetledi

Dora Demszky@ddemszky·2d

🧵New paper led by @meiflwr_ with @lenaphalen and me at #LAK2026: “Marked Pedagogies” When LLM feedback tools are told a student is Black, ELL, low-achieving, or has a disability --- how does the feedback change, even when the essay is identical?

English

549

Adrian Brasoveanu retweetledi

Prakash Sharma@PrakashS720·4d

🚨Breaking: An Anthropic engineer (@trq212) just broke down how they actually use skills inside Claude Code — and it’s a completely different mindset. Here’s the real system 👇 Skills are NOT text files. They are modular systems the agent can explore and execute. Each skill can include: reference knowledge (APIs, libraries) executable scripts datasets & queries workflows & automation → The agent doesn’t just read… it uses them The best teams don’t create random skills. They design them into clear categories: • Knowledge skills → teach APIs, CLIs, systems • Verification skills → test flows, assert correctness • Data skills → fetch, analyze, compare signals • Automation skills → run repeatable workflows • Scaffolding → generate structured code • Review systems → enforce quality & standards • CI/CD → deploy, monitor, rollback • Runbooks → debug real production issues • Infra ops → manage systems safely → Each skill has a single responsibility The biggest unlock is verification Most people stop at generation. Top teams build systems that: simulate real usage run assertions check logs & outputs → This is what makes agents reliable Great skills are not static. They evolve. They capture: edge cases failures “gotchas” → Every mistake becomes part of the system Another thing most people miss: Skills are folders, not files. This allows: progressive disclosure structured context better reasoning → The filesystem becomes part of the agent’s brain And the biggest mistake? Trying to control everything. Rigid prompts. Micromanagement. Over-constraints. Instead: provide structure give high-signal context allow flexibility → Let the agent adapt to the problem The best teams treat skills like internal products: Reusable. Composable. Shareable across the org. That’s how you scale agents. Not with better prompts. But with better systems. Save this. This is how AI actually gets useful

English

191

17.9K

Adrian Brasoveanu retweetledi

Ihtesham Ali@ihtesham2005·17 Mar

🚨 Holy shit...A developer on GitHub just built a full development methodology for AI coding agents and it has 40.9K stars on GitHub. It's called Superpowers, and it completely changes how your AI agent writes code. Right now, most people fire up Claude Code or Codex and just… let it go. The agent guesses what you want, writes code before understanding the problem, skips tests, and produces spaghetti you have to babysit. Superpowers fixes all of that. Here's what happens when you install it: → Before writing a single line, the agent stops and brainstorms with you. It asks what you're actually trying to build, refines the spec through questions, and shows it to you in chunks short enough to read. → Once you approve the design, it creates an implementation plan so detailed that "an enthusiastic junior engineer with poor taste and no judgement" could follow it. → Then it launches subagent-driven development. Fresh subagents per task. Two-stage code review after each one (spec compliance, then code quality). The agent can run autonomously for hours without deviating from your plan. → It enforces true test-driven development. Write failing test → watch it fail → write minimal code → watch it pass → commit. It literally deletes code written before tests. → When tasks are done, it verifies everything, presents options (merge, PR, keep, discard), and cleans up. The philosophy is brutal: systematic over ad-hoc. Evidence over claims. Complexity reduction. Verify before declaring success. Works with Claude Code (plugin install), Codex, and OpenCode. This isn't a prompt template. It's an entire operating system for how AI agents should build software. 100% Opensource. MIT License.

English

207

686

6.2K

922.2K

Adrian Brasoveanu retweetledi

Charly Wargnier@DataChaz·16 Mar

THIS is the wildest open-source project I’ve seen this month. We were all hyped about @karpathy's autoresearch project automating the experiment loop a few weeks ago. (ICYMI → github.com/karpathy/autor…) But a bunch of folks just took it ten steps further and automated the entire scientific method end-to-end. It's called AutoResearchClaw, and it's fully open-source. You pass it a single CLI command with a raw idea, and it completely takes over 🤯 The 23-stage loop they designed is insane: ✦ First, it handles the literature review. - It searches arXiv and Semantic Scholar for real papers - Cross-references them against DataCite and CrossRef. - No fake papers make it through. ✦ Second, it runs the sandbox. - It generates the code from scratch. - If the code breaks, it self-heals. - You don't have to step in. ✦ Finally, it writes the paper. - It structures 5,000+ words into Introduction, Related Work, Method, and Experiments. - Formats the math, generates the comparison charts, - Then wraps the whole thing in official ICML or ICLR LaTeX templates. You can set it to pause for human approval, or you can just pass the --auto-approve flag and walk away. What it spits out at the end: → Full academic paper draft → Conference-grade .tex files → Verified, hallucination-free citations → All experiment scripts and sandbox results This is what autonomous AI agents actually look like in 2026. Free and open-source. Link to repo in 🧵 ↓

English

382

2.4K

209.3K

Adrian Brasoveanu retweetledi

Nav Toor@heynavtoor·18 Mar

🚨 Governments pay millions for this. Someone just open sourced it for free. It's called Crucix. It watches the entire world. And texts you when something changes. It pulls from 26 live data sources every 15 minutes and renders everything on a single Jarvis-style dashboard. Here's what it watches: → Satellite fire detection (NASA) → Live flight tracking → Radiation monitoring → Conflict zone events → Economic indicators from the Fed → Live market prices, crypto, oil, and commodities → Sanctions lists → Social sentiment from 17 Telegram intelligence channels → Maritime vessel tracking → News from GDELT and RSS feeds Here's what makes this one different: It's two-way. It pushes alerts to your Telegram and Discord. You text it back. Type /brief from your phone and get a full intelligence summary. Type /sweep to force a new scan. It responds like an assistant. It even generates trade ideas based on cross-domain signals. No cloud. No subscription. No telemetry. Runs on your machine. node server.mjs That's it. Your own intelligence terminal. This is the kind of setup that costs six figures behind closed doors. 100% Open Source. MIT License.

English

105

871

6.4K

493.6K

Adrian Brasoveanu retweetledi

Rohan Paul@rohanpaul_ai·18 Mar

This research introduces a system that recovers the hidden information needed for computers to successfully reproduce academic experiments. Academic papers often leave out crucial details, which prevents other researchers from recreating the results in their work. This paper addresses the problem by identifying three types of missing knowledge, specifically relational, somatic, and collective details. The proposed system, named PAPERREPRO, uses a graph-based framework to automatically find and apply this missing information during reproduction. It works by analyzing relationships between the original paper and its neighbors, then uses feedback from running code to refine its understanding. This method allows AI agents to fill in the gaps that authors leave behind, making automated experimentation much more reliable. By turning these implicit details into actionable steps, the framework bridges the gap between static text and executable code. ---- Paper Link – arxiv. org/abs/2603.01801 Paper Title: "What Papers Don't Tell You: Recovering Tacit Knowledge for Automated Paper Reproduction"

English

6.4K

Adrian Brasoveanu retweetledi

Thariq@trq212·17 Mar

x.com/i/article/2033…

ZXX

369

2.2K

16K

6.6M

Adrian Brasoveanu retweetledi

God of Prompt@godofprompt·9 Mar

🚨 BREAKING: Anthropic quietly dropped a 32-page playbook on building Claude Skills. Skills let you teach Claude your exact workflow once. It executes it every time after that. Across Claude.ai, Claude Code, and the API. No more re-explaining. No more inconsistent output. This is how AI goes from chatbot to custom operating system. PDF: resources.anthropic.com/hubfs/The-Comp…

English

353

2.3K

524.9K

Adrian Brasoveanu retweetledi

Sebastian Raschka@rasbt·12 Mar

Another week, another noteworthy open-weight LLM release. Nvidia’s Nemotron 3 Super 120B-A12B looks pretty good. Benchmarks are on par with Qwen3.5 122B and GPT-OSS 120B, but the throughput is great! Below is a short, visual architecture rundown.

English

125

786

35.8K

Adrian Brasoveanu retweetledi

0xMarioNawfal@RoundtableSpace·9 Mar

SOMEONE CREATED A GITHUB REPO WITH AN ENTIRE SETUP FOR AN AI AGENCY Engineers, designers, growth marketers, product managers. Broken down how even a rookie could understand. It has over 10K stars in 7 days GitHub: github.com/msitarzewski/a…

English

228

1.3K

11K

1.4M

Adrian Brasoveanu retweetledi

Millie Marconi@MillieMarconnni·7 Mar

This is wild...Someone just open-sourced the Claude Code setup that won an Anthropic hackathon. You're getting: → Agents + skills pre-configured → Custom hooks, commands, and rules → MCP servers ready to go → PM2 + multi-agent orchestration → 6 new commands out of the box Stop copy-pasting random configs. Just clone this. (Link in comments)

English

152

1.4K

124.6K

Adrian Brasoveanu retweetledi

Christopher Manning@chrmanning·6 Mar

Here’s a piece by @goodfellow_ian, @sunfanyun, and me arguing that use of symbolic representations and game virtual world data offers the best path to building action-conditioned multimodal world models that enable reliable prediction and planning for long-horizon tasks. The advantages of symbolic language representations in this context was also recently argued for in new work from @physical_int: pi.website/research/memory .

Moonlake@moonlake

x.com/i/article/2029…

English

687

96.4K

Adrian Brasoveanu retweetledi

Simplifying AI@simplifyinAI·6 Mar

🚨 BREAKING: Stanford and Harvard just published the most unsettling AI paper of the year. It’s called “Agents of Chaos,” and it proves that when autonomous AI agents are placed in open, competitive environments, they don't just optimize for performance. They naturally drift toward manipulation, collusion, and strategic sabotage. It’s a massive, systems-level warning. The instability doesn’t come from jailbreaks or malicious prompts. It emerges entirely from incentives. When an AI’s reward structure prioritizes winning, influence, or resource capture, it converges on tactics that maximize its advantage, even if that means deceiving humans or other AIs. The Core Tension: Local alignment ≠ global stability. You can perfectly align a single AI assistant. But when thousands of them compete in an open ecosystem, the macro-level outcome is game-theoretic chaos. Why this matters right now: This applies directly to the technologies we are currently rushing to deploy: → Multi-agent financial trading systems → Autonomous negotiation bots → AI-to-AI economic marketplaces → API-driven autonomous swarms. The Takeaway: Everyone is racing to build and deploy agents into finance, security, and commerce. Almost nobody is modeling the ecosystem effects. If multi-agent AI becomes the economic substrate of the internet, the difference between coordination and collapse won’t be a coding issue, it will be an incentive design problem.

English

936

6.1K

17.7K

5.1M

Adrian Brasoveanu retweetledi

Mark Worrall@infinitehumanai·8 Mar

Reminds me of Peter Naur's classic 1985 essay "Programming as Theory Building" which argues that a program is not its source code. A program is a shared mental construct (he uses the word theory) that lives in the minds of the people who work on it. If you lose the people, you lose the program. The code is merely a written representation of the program, and it's lossy, so you can't reconstruct a program from its code. If you think of total software debt as technical debt + cognitive debt, then previously, we mostly had technical debt. Now with AI we have both. Previously, when you built something, you accumulated technical debt but relatively little cognitive debt because you had to understand what you were building in order to build it. In other words: the theory came for free as a byproduct of the work. AI breaks that coupling. Now you can produce code without building the theory. So you're now able to accumulate both kinds of debt simultaneously - technical debt in the code and cognitive debt in yourself. And cognitive debt is arguably worse because you can fool yourself into believing it doesn't exist. Technical debt tends to show up in semi-obvious ways that we understand well as an industry. Cognitive debt is more insidious - it means you're unable to even reason about the program (because you possess no theory of it) - which is what Naur describes as the "death" of a program.

English

235

17.5K

Adrian Brasoveanu retweetledi

Jeremy Howard@jeremyphoward·8 Mar

A listener has created this detailed vocabulary and set of linked references for anyone interested in diving deeper: share.solve.it.com/d/28d1864aad07…

Machine Learning Street Talk@MLStreetTalk

A masterclass from @jeremyphoward on why AI coding tools can be a trap -- and what 45 years of programming taught him that most vibe coders will never learn. - AI coding tools exploit gambling psychology - The difference between typing code and software engineering - Enterprise coding AND prompt-only vibe coding are "inhumane" i.e. disconnecting humans from understanding-building - AI tools remove the "desirable difficulty" you need to build deep mental models. Out on MLST now!

English

374

53K

Adrian Brasoveanu retweetledi

Nav Toor@heynavtoor·6 Mar

🚨BREAKING: OpenAI published a paper proving that ChatGPT will always make things up. Not sometimes. Not until the next update. Always. They proved it with math. Even with perfect training data and unlimited computing power, AI models will still confidently tell you things that are completely false. This isn't a bug they're working on. It's baked into how these systems work at a fundamental level. And their own numbers are brutal. OpenAI's o1 reasoning model hallucinates 16% of the time. Their newer o3 model? 33%. Their newest o4-mini? 48%. Nearly half of what their most recent model tells you could be fabricated. The "smarter" models are actually getting worse at telling the truth. Here's why it can't be fixed. Language models work by predicting the next word based on probability. When they hit something uncertain, they don't pause. They don't flag it. They guess. And they guess with complete confidence, because that's exactly what they were trained to do. The researchers looked at the 10 biggest AI benchmarks used to measure how good these models are. 9 out of 10 give the same score for saying "I don't know" as for giving a completely wrong answer: zero points. The entire testing system literally punishes honesty and rewards guessing. So the AI learned the optimal strategy: always guess. Never admit uncertainty. Sound confident even when you're making it up. OpenAI's proposed fix? Have ChatGPT say "I don't know" when it's unsure. Their own math shows this would mean roughly 30% of your questions get no answer. Imagine asking ChatGPT something three times out of ten and getting "I'm not confident enough to respond." Users would leave overnight. So the fix exists, but it would kill the product. This isn't just OpenAI's problem. DeepMind and Tsinghua University independently reached the same conclusion. Three of the world's top AI labs, working separately, all agree: this is permanent. Every time ChatGPT gives you an answer, ask yourself: is this real, or is it just a confident guess?

English

1.4K

8.9K

33.8K

3.2M

Adrian Brasoveanu retweetledi

Hasan Toor@hasantoxr·19 Şub

🚨BREAKING: Microsoft Research + Salesforce just dropped a paper that should scare every AI builder. They tested 15 top LLMs GPT-4.1, Gemini 2.5 Pro, Claude 3.7 Sonnet, o3, DeepSeek R1, Llama 4 across 200,000+ simulated conversations. Single-turn prompt: 90% performance. Multi-turn conversation: 65% performance. Same model. Same task. Just... talking normally. The culprit isn't intelligence. Aptitude only dropped 15%. Unreliability EXPLODED by 112%. → LLMs answer before you finish explaining (wrong assumptions get baked in permanently) → They fall in love with their first wrong answer and build on it → They forget the middle of your conversation entirely → Longer responses introduce more assumptions = more errors Even reasoning models failed. o3 and DeepSeek R1 performed just as badly. Extra thinking tokens did nothing. Setting temperature to 0? Still broken. The fix right now: give your AI everything upfront in one message instead of back-and-forth. Every benchmark you've seen was tested on single-turn prompts in perfect lab conditions. Real conversations break every model on the market and nobody's talking about it.

English

700

1.7K

1.6M

Adrian Brasoveanu retweetledi

Paolo Perazzo@SiVola·16 Şub

@sama And to learn about every details of OpenClaw architecture take a look at this article x.com/sivola/status/…

Paolo Perazzo@SiVola

x.com/i/article/2021…

English

147

54K

Adrian Brasoveanu retweetledi

Sam Altman@sama·16 Şub

Peter Steinberger is joining OpenAI to drive the next generation of personal agents. He is a genius with a lot of amazing ideas about the future of very smart agents interacting with each other to do very useful things for people. We expect this will quickly become core to our product offerings. OpenClaw will live in a foundation as an open source project that OpenAI will continue to support. The future is going to be extremely multi-agent and it's important to us to support open source as part of that.

English

4.3K

46.5K

16.7M

Keşfet

@AIatMeta @BingchenZhao @winnieyangwn @j_foerst @jeffclune @MinqiJiang @smdvln @rybolos